summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Revert r342457 "Fixes removal of dead elements from PressureDiff (PR37252)."Hans Wennborg2018-09-181-1/+2
| | | | | | | | | | | This broke the lit tests on a bunch of buildbots, e.g. http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/36679 > Reviewed By: MatzeB > > Differential Revision: https://reviews.llvm.org/D51495 llvm-svn: 342482
* [PowerPC] Do not emit record-form rotates when record-form andi/andis sufficesNemanja Ivanovic2018-09-181-6/+28
| | | | | | | | | | | | This is a follow-up to the previous patch that eliminated some of the rotates. With this addition, we will also emit the record-form andis. This patch increases the number of record-form rotates we eliminate by more than 70%. Differential revision: https://reviews.llvm.org/D44897 llvm-svn: 342478
* [LTO] Make detection of WPD remark enablement more robustTeresa Johnson2018-09-181-9/+8
| | | | | | | | | | | | | | | | Summary: Currently only the first function in the module is checked to see if it has remarks enabled. If that first function is a declaration, remarks will be incorrectly skipped. Change to look for the first non-empty function. Reviewers: pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51556 llvm-svn: 342477
* [LLVM-C][OCaml] Add UnifyFunctionExitNodes pass to C and OCaml APIswhitequark2018-09-181-0/+5
| | | | | | | | | | | | | | | | Summary: Adds LLVMAddUnifyFunctionExitNodesPass to expose createUnifyFunctionExitNodesPass to the C and OCaml APIs. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52212 llvm-svn: 342476
* [LLVM-C][OCaml] Add LowerAtomic pass to C and OCaml APIswhitequark2018-09-181-0/+4
| | | | | | | | | | | | | | | | Summary: Adds LLVMAddLowerAtomicPass to expose createLowerAtomicPass in the C and OCaml APIs. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52211 llvm-svn: 342475
* [PowerPC] Optimize compares fed by ANDISoNemanja Ivanovic2018-09-181-1/+2
| | | | | | | | | | | | | | | Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions. When optimizing compares, we handle the former in order to eliminate the compare instruction but not the latter. This patch just adds the latter to the set of instructions we optimize. The reason these instructions need to be handled separately is that they are not part of the RecFormRel map (since they don't have a non-record-form). The missing "and-immediate-shifted" is just an oversight in the initial implementation. Differential revision: https://reviews.llvm.org/D51353 llvm-svn: 342472
* [TargetLowering] Android has sincos functionsJohn Brawn2018-09-181-1/+2
| | | | | | | | | Since Android API version 9 the Android libm has had the sincos functions, so they should be recognised as libcalls and sincos optimisation should be applied. Differential Revision: https://reviews.llvm.org/D52025 llvm-svn: 342471
* [X86][SSE] LowerShift - pull out repeated getTargetVShiftUniformOpcode ↵Simon Pilgrim2018-09-181-25/+19
| | | | | | calls. NFCI. llvm-svn: 342462
* Fixes removal of dead elements from PressureDiff (PR37252).Yury Gribov2018-09-181-2/+1
| | | | | | | | Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51495 llvm-svn: 342457
* [AArch64] Attempt to parse more operands as expressionsDavid Green2018-09-181-24/+11
| | | | | | | | | | | | | | This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef to parse more complex expressions as relocatable operands. It is hopefully better than the existing code which only handles Symbol +- Constant. This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also loosens the requirements on parsing addends in ld/st and mov's and adds a number of tests. Differential Revision: https://reviews.llvm.org/D51792 llvm-svn: 342455
* [IndVars] Remove unreasonable checks in rewriteLoopExitValuesMax Kazantsev2018-09-181-11/+5
| | | | | | | | | | | A piece of logic in rewriteLoopExitValues has a weird check on number of users which allowed an unprofitable transform in case if an instruction has more than 6 users. Differential Revision: https://reviews.llvm.org/D51404 Reviewed By: etherzhhb llvm-svn: 342444
* AMDGPU: Don't form fmed3 if it will require materializationMatt Arsenault2018-09-181-2/+9
| | | | | | | If there is a single use constant, it can be folded into the min/max, but not into med3. llvm-svn: 342443
* LSV: Fix adjust alloca alignment trick for AMDGPUMatt Arsenault2018-09-181-29/+31
| | | | | | | | | | This was checking the hardcoded address space 0 for the stack. Additionally, this should be checking for legality with the adjusted alignment, so defer the alignment check. Also try to split if the unaligned access isn't allowed. llvm-svn: 342442
* [PowerPC] Add Itineraries of IIC_IntMulHD for P7/P8QingShan Zhang2018-09-182-0/+8
| | | | | | | | | | | | | | | | | | When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. Patch By: jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D52040 llvm-svn: 342441
* AMDGPU: Expand vector canonicalizesMatt Arsenault2018-09-181-0/+1
| | | | llvm-svn: 342439
* [LLVM-C][OCaml] Add C and OCaml APIs for llvm::StructType::isLiteralwhitequark2018-09-181-0/+4
| | | | | | | | | | | | | | | | | Summary: This patch adds LLVMIsLiteralStruct to the C API to expose StructType::isLiteral. This is then used to implement the analogous addition to the OCaml API. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52209 llvm-svn: 342435
* [LLVM-C] Add support for ConstantExpr in LLVMGetNumIndices and LLVMGetIndiceswhitequark2018-09-181-0/+4
| | | | | | | | | | | | | | | | Summary: ConstantExpr supports getIndices, but prior to this patch LLVMGetNumIndices and LLVMGetIndices would error on them. Reviewers: whitequark Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52206 llvm-svn: 342434
* Revert "[ARM] Cleanup ARM CGP isSupportedValue"Volodymyr Sapsai2018-09-181-19/+42
| | | | | | | | | | | | | | | This reverts r342395 as it caused error > Argument value type does not match pointer operand type! > %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25 > i8in function atomic_flag_test_and_set > fatal error: error in backend: Broken function found, compilation aborted! on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/ More details are available at https://reviews.llvm.org/D52080 llvm-svn: 342431
* [EarlyCSEwMemorySSA] Add MSSA verification and tests to make EarlyCSE ↵Alina Sbirlea2018-09-171-0/+2
| | | | | | | | | | | | | | | | | failures easier to track. Summary: EarlyCSE can make IR changes that will leave MemorySSA with accesses claiming to be optimized, but for which a subsequent MemorySSA run will yield a different optimized result. Due to relying on AA queries, we can't fix this in general, unless we recompute MemorySSA. Adding some tests to track this and a basic verify for future potential failures. Reviewers: george.burgess.iv, gberry Subscribers: sanjoy, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D51960 llvm-svn: 342422
* [mips] Fix MIPS N32 ABI triples supportSimon Atanasyan2018-09-174-4/+19
| | | | | | | | | | | | Add support mips64(el)-linux-gnuabin32 triples, and set them to N32. Debian architecture name mipsn32/mipsn32el are also added. Set UseIntegratedAssembler for N32 if we can detect it. Patch by YunQiang Su. Differential revision: https://reviews.llvm.org/D51408 llvm-svn: 342416
* [PDB] Make the native reader support enumerators.Zachary Turner2018-09-176-12/+254
| | | | | | | | | | | Previously we would dump the names of enum types, but not their enumerator values. This adds support for enumerator values. In doing so, we have to introduce a general purpose mechanism for caching symbol indices of field list members. Unlike global types, FieldList members do not have a TypeIndex. So instead, we identify them by the pair {TypeIndexOfFieldList, IndexInFieldList}. llvm-svn: 342415
* [PDB] Make the native reader support modified types.Zachary Turner2018-09-175-53/+150
| | | | | | | | Previously for cv-qualified types, we would just ignore them and they would never get printed. Now we can enumerate them and cache them like any other symbol type. llvm-svn: 342414
* [MC] Avoid inlining constant symbols with variants.Nirav Dave2018-09-171-1/+1
| | | | | | | | | | | | | | Summary: Defer unnecessary early inlining of constants to symbol variants. Fixes PR38945. Reviewers: nickdesaulniers, rnk Subscribers: nemanjai, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D52188 llvm-svn: 342412
* [Loopinfo] Remove one latch-case in getLoopID. NFC.Michael Kruse2018-09-171-20/+15
| | | | | | | | | | | | getLoopID has different control flow for two cases: If there is a single loop latch and for any other number of loop latches (0 and more than one). The latter case should return the same result if there is only a single latch. We can save the preceding redundant search for a latch by handling both cases with the same code. Differential Revision: https://reviews.llvm.org/D52118 llvm-svn: 342406
* [MachineOutliner][NFC] Don't map more illegal instrs than you have toJessica Paquette2018-09-171-0/+11
| | | | | | | | | | | | | | | We were mapping an instruction every time we saw something we couldn't map before this. Since each illegal mapping is unique, we only have to do this once. This makes it so that we don't map illegal instructions when the previous mapped instruction was illegal. In CTMark (AArch64), this results in 240 fewer instruction mappings on average over 619 files in total. The largest improvement is 12576 fewer mappings in one file, and the smallest is 0. The median improvement is 101 fewer mappings. llvm-svn: 342405
* [X86ISel] Implement byval lowering for Win64 calling conventionKeno Fischer2018-09-172-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The IR reference for the `byval` attribute states: ``` This indicates that the pointer parameter should really be passed by value to the function. The attribute implies that a hidden copy of the pointee is made between the caller and the callee, so the callee is unable to modify the value in the caller. This attribute is only valid on LLVM pointer arguments. ``` However, on Win64, this attribute is unimplemented and the raw pointer is passed to the callee instead. This is problematic, because frontend authors relying on the implicit hidden copy (as happens for every other calling convention) will see the passed value silently (if mutable memory) or loudly (by means of a crash) modified because the callee treats the location as scratch memory space it is allowed to mutate. At this point, it's worth taking a step back to understand the context. In most calling conventions, aggregates that are too large to be passed in registers, instead get *copied* to the stack at a fixed (computable from the signature) offset of the stack pointer. At the LLVM, we hide this hidden copy behind the byval attribute. The caller passes a pointer to the desired data and the callee receives a pointer, but these pointers are not the same. In particular, the pointer that the callee receives points to temporary stack memory allocated as part of the call lowering. In most calling conventions, this pointer is never realized in registers or memory. The temporary memory is simply defined by an implicit offset from the stack pointer at function entry. Win64, uniquely, works differently. The structure is still passed in memory, but instead of being stored at an implicit memory offset, the caller computes a pointer to the temporary memory and passes it to the callee as a regular pointer (taking up a register, or if all registers are taken up, an additional stack slot). Presumably, this was done to allow eliding the copy when passing aggregates through several functions on the stack. This explains why ignoring the `byval` attribute mostly works on Win64. The argument simply gets passed as a pointer and as long as we're ok with the callee trampling all over that memory, there are no ill effects. However, it does contradict the documentation of the `byval` attribute which specifies that there is to be an implicit copy. Frontends can of course work around this by never emitting the `byval` attribute for Win64 and creating `alloca`s for the requisite temporary stack slots (and that does appear to be what frontends are doing). However, the presence of the `byval` attribute is not a trap for frontend authors, since it seems to work, but silently modifies the passed memory contrary to documentation. I see two solutions: - Disallow the `byval` attribute in the verifier if using the Win64 calling convention. - Make it work by simply emitting a temporary stack copy as we would with any other calling convention (frontends can of course always not use the attribute if they want to elide the copy). This patch implements the second option (make it work), though I would be fine with the first also. Ref: https://github.com/JuliaLang/julia/issues/28338 Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51842 llvm-svn: 342402
* [AMDGPU] Initialize instruction itinerary from GCNSubtargetStanislav Mekhanoshin2018-09-172-0/+6
| | | | | | | | I need to use it in the GCN codegen. Differential Revision: https://reviews.llvm.org/D52123 llvm-svn: 342400
* Revert "[DWARF] reposting r342048, which was reverted in r342056 due to ↵Alexander Kornienko2018-09-177-201/+221
| | | | | | | | | buildbot errors. Adjusted 2 test cases for ARM and darwin and fixed a bug with the original change in dsymutil." This reverts commit r342218. Due to a number of failures under TSAN. An isolated test case is being worked on. llvm-svn: 342399
* [CVP] Handle instructions with no user. No need to create CVPLattice state. ↵Xin Tong2018-09-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This handles terminator instructions and more. Summary: I tested this patch by compiling sqlite3.ll (clang -O3 -mllvm -disable-llvm-optzns sqlite3.c.) opt -called-value-propagation sqlite3.ll -time-passes -f -o out.ll I get 10+% speedup for the pass. I expect some of the gain come from skipping terminator instructions. === BEFORE THE PATCH === ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.5562 seconds (0.5582 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2485 ( 46.4%) 0.0120 ( 57.7%) 0.2605 ( 46.8%) 0.2615 ( 46.8%) Bitcode Writer 0.1607 ( 30.0%) 0.0079 ( 37.7%) 0.1685 ( 30.3%) 0.1693 ( 30.3%) Called Value Propagation 0.1262 ( 23.6%) 0.0009 ( 4.5%) 0.1271 ( 22.9%) 0.1275 ( 22.8%) Module Verifier 0.5353 (100.0%) 0.0209 (100.0%) 0.5562 (100.0%) 0.5582 (100.0%) Total === AFTER THE PATCH === ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.5338 seconds (0.5355 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2498 ( 48.6%) 0.0118 ( 59.3%) 0.2615 ( 49.0%) 0.2629 ( 49.1%) Bitcode Writer 0.1377 ( 26.8%) 0.0075 ( 37.8%) 0.1452 ( 27.2%) 0.1455 ( 27.2%) Called Value Propagation 0.1264 ( 24.6%) 0.0006 ( 3.0%) 0.1270 ( 23.8%) 0.1271 ( 23.7%) Module Verifier 0.5139 (100.0%) 0.0199 (100.0%) 0.5338 (100.0%) 0.5355 (100.0%) Total Reviewers: davide, mssimpso Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49108 llvm-svn: 342398
* Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an ↵Amara Emerson2018-09-171-1/+10
| | | | | | | | extract_subvector with invalid index."" Fixed the assertion failure. llvm-svn: 342397
* [DebugInfo] Remove redundant argument. [NFC]Jonas Devlieghere2018-09-171-18/+17
| | | | | | | Removes the redundant UnitType parameter from verifyUnitContents. I also fixed some formatting issues as I was touching the file. llvm-svn: 342396
* [ARM] Cleanup ARM CGP isSupportedValueSam Parker2018-09-171-42/+19
| | | | | | | | | | | | isSupportedValue explicitly checked and accepted many types of value, primarily for debugging reasons. Remove most of these checks and do a bit of refactoring now that the pass is more stable. This also enables ZExts to be sources, but this has very little practical benefit at the moment extend instructions will still be introduced. Differential Revision: https://reviews.llvm.org/D52080 llvm-svn: 342395
* [ARM] Disallow icmp with negative imm and overflowSam Parker2018-09-171-0/+11
| | | | | | | | | | We allow overflowing instructions if they're decreasing and only used by an unsigned compare. Add the extra condition that the icmp cannot be using a negative immediate. Differential Revision: https://reviews.llvm.org/D52102 llvm-svn: 342392
* Fix vectorization of canonicalizeMatt Arsenault2018-09-171-0/+1
| | | | llvm-svn: 342390
* [GVNHoist] Re-enable GVNHoist by defaultAlexandros Lamprineas2018-09-172-4/+4
| | | | | | | | | | | | Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912 has been fixed by rL342055. Precommit testing performed: * Overnight runs of csmith comparing the output between programs compiled with gvn-hoist enabled/disabled. * Bootstrap builds of clang with UbSan/ASan configurations. llvm-svn: 342387
* [PowerPC] Fix label address calculation for ppc64Strahinja Petrovic2018-09-171-1/+2
| | | | | | | | This patch fixes calculating address of label for non-pic ppc64. Differential Revision: https://reviews.llvm.org/D50965 llvm-svn: 342368
* [NFC] Turn unsigned counters into boolean flagsMax Kazantsev2018-09-171-8/+13
| | | | llvm-svn: 342360
* [DebugInfo] Fix build when std::vector::iterator is a pointerKristina Brooks2018-09-161-1/+1
| | | | | | | | | | | | std::vector::iterator type may be a pointer, then iterator::value_type fails to compile since iterator is not a class, namespace, or enumeration. Patch by orivej (Orivej Desh) Differential Revision: https://reviews.llvm.org/D52142 llvm-svn: 342354
* [X86][SSE] Always enable ISD::SRL -> ISD::MULHU for v8i16Simon Pilgrim2018-09-161-1/+0
| | | | | | For constant non-uniform cases we'll never introduce more and/andn/or selects than already occur in generic pre-SSE41 ISD::SRL lowering. llvm-svn: 342352
* [X86][AVX] Enable ISD::SRL -> ISD::MULHU for v16i16Simon Pilgrim2018-09-161-3/+1
| | | | | | Now that rL340913 has landed with improved v16i16 selects as shuffles. llvm-svn: 342349
* [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)Sanjay Patel2018-09-164-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040. Copying the motivational statement by @evandro from that patch: "This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64." I'm proposing to add only the minimum support for a DAG node here. Since we don't have an LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I don't think we need to worry about DAG builder, legalization, a strict variant, etc. We should be able to expand as needed when adding more functionality/transforms. For reference, these are transform suggestions currently listed in SimplifyLibCalls.cpp: // * cbrt(expN(X)) -> expN(x/3) // * cbrt(sqrt(x)) -> pow(x,1/6) // * cbrt(cbrt(x)) -> pow(x,1/9) Also, given that we bail out on long double for now, there should not be any logical differences between platforms (unless there's some platform out there that has pow() but not cbrt()). Differential Revision: https://reviews.llvm.org/D51753 llvm-svn: 342348
* [x86] fix uses check in broadcast transform (PR38949)Sanjay Patel2018-09-161-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38949 It's not clear to me that we even need a one-use check in this fold. Ie, 2 independent loads might be better than a load+dependent shuffle. Note that the existing re-use tests are not affected. We actually do form a broadcast node in those tests now because there's no extra use of the insert_subvector node in those cases. But something later in isel pattern matching decides that it is not worth using a broadcast for the full load in those tests: Legalized selection DAG: %bb.0 'test_broadcast_2f64_4f64_reuse:' t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t18: v4f64 = insert_subvector undef:v4f64, t7, Constant:i64<0> t20: v4f64 = insert_subvector t18, t7, Constant:i64<2> Becomes: t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ISEL: Starting selection on root node: t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ... Created node: t27: v4f64 = INSERT_SUBREG IMPLICIT_DEF:v4f64, t7, TargetConstant:i32<7> Morphed node: t21: v4f64 = VINSERTF128rr t27, t7, TargetConstant:i8<1> llvm-svn: 342347
* [InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and ↵Craig Topper2018-09-152-7/+21
| | | | | | | | | | | | | | | | | | | (sub (zext x), (zext y)) --> (zext (sub x, y)) Summary: If the sub doesn't overflow in the original type we can move it above the sext/zext. This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported. Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52075 llvm-svn: 342335
* Give InfoStreamBuilder an opt-in method to write a hash of the PDB as GUID.Nico Weber2018-09-152-10/+34
| | | | | | | | | | | | | Naively computing the hash after the PDB data has been generated is in practice as fast as other approaches I tried. I also tried online-computing the hash as parts of the PDB were written out (https://reviews.llvm.org/D51887; that's also where all the measuring data is) and computing the hash in parallel (https://reviews.llvm.org/D51957). This approach here is simplest, without being slower. Differential Revision: https://reviews.llvm.org/D51956 llvm-svn: 342333
* Update microsoftDemangle() to work more like itaniumDemangle().Nico Weber2018-09-152-29/+32
| | | | | | | | | | | * Use same method of initializing the output stream and its buffer * Allow a nullptr Status pointer * Don't print the mangled name on demangling error * Write to N (if it is non-nullptr) Differential Revision: https://reviews.llvm.org/D52104 llvm-svn: 342330
* [X86] Remove an fp->int->fp domain crossing in LowerUINT_TO_FP_i64.Craig Topper2018-09-151-5/+3
| | | | | | | | | | | | | | Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back? Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52134 llvm-svn: 342327
* [X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))Craig Topper2018-09-151-0/+26
| | | | | | | | | | | | | | | | | Summary: MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load. Inspired by PR38840. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52121 llvm-svn: 342326
* [InstCombine][x86] try harder to convert blendv intrinsic to generic IR ↵Sanjay Patel2018-09-151-7/+15
| | | | | | | | | | | | | | | | | | (PR38814) Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 If this works, it's an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. I don't think that's too likely, but I've kept this patch minimal with a 'TODO', so we can test that theory in the wild before expanding the transform. Differential Revision: https://reviews.llvm.org/D52059 llvm-svn: 342324
* [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708)Roman Lebedev2018-09-151-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) The last (as far i know?) pattern, non-canonical due to the extra use. https://godbolt.org/z/aCMsPk https://rise4fun.com/Alive/I6f https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52062 llvm-svn: 342321
* [CodeGenPrepare] Preserve debug locs in OptimizeExtractBitsVedant Kumar2018-09-151-1/+6
| | | | | | | | | CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it easier for the backend to emit fancy extract-bits instructions (e.g UBFX). Teach it to preserve debug locations and salvage debug values. llvm-svn: 342319
OpenPOWER on IntegriCloud