summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][AVX] Use extract128BitVector helper. NFCI.Simon Pilgrim2017-12-161-4/+1
| | | | llvm-svn: 320932
* [X86][AVX] Fix failed broadcast foldSimon Pilgrim2017-12-161-3/+7
| | | | | | Strip excess BITCASTs from EXTRACT_SUBVECTOR input llvm-svn: 320930
* [Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed.Sean Fertile2017-12-161-6/+10
| | | | | | | | If the loop operand type is int8 then there will be no residual loop for the unknown size expansion. Dont create the residual-size and bytes-copied values when they are not needed. llvm-svn: 320929
* [X86] Don't pass a zero input to the passthru operand of ↵Craig Topper2017-12-161-9/+6
| | | | | | | | getVectorMaskingNode/getScalarMaskingNode when its going to emit an ISD::OR/ISD::AND. NFCI In those cases, the pass thru operand of the methods isn't used. The calls to the scalar version were passing a MVT::i1 zero, which is an illegal type at the stage this code runs. llvm-svn: 320928
* [X86] Have getVectorMaskingNode return an ISD::AND for X86ISD::VPSHUFBITQMB ↵Craig Topper2017-12-161-0/+1
| | | | | | instead of creating a select with one input being 0. llvm-svn: 320927
* [X86] When using vpopcntdq for ctpop of v8i16 vectors, only promote to v8i32.Craig Topper2017-12-161-8/+7
| | | | | | Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half. llvm-svn: 320926
* [X86] Combine some more scheduler model entries using regular expressions.Craig Topper2017-12-164-200/+100
| | | | | | We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions. llvm-svn: 320924
* [X86] Use instrs instead of instregex for gather/scatter instructions in the ↵Craig Topper2017-12-164-130/+116
| | | | | | | | scheduler models. Combine into single InstrRW entries. The reduces the number of scheduler groups in subtarget info. llvm-svn: 320923
* [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+selSanjay Patel2017-12-161-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to do this for 2 reasons: 1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766. 2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern. More detail about what happens in the backend: 1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs into the shift variant. That is the opposite of this IR canonicalization. 2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization. 3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2 into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node when that's legal/custom. 4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a variety of ways. a. For #2, the vector path is missing the case for setlt with a '1' constant. b. For #3, we are missing a match for commuted versions of the shift variants. 5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the shift sequence when not. 6. In the following examples with this patch applied, we may get conditional moves rather than the shift produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate. define i32 @abs_shifty(i32 %x) { %signbit = ashr i32 %x, 31 %add = add i32 %signbit, %x %abs = xor i32 %signbit, %add ret i32 %abs } define i32 @abs_cmpsubsel(i32 %x) { %cmp = icmp slt i32 %x, zeroinitializer %sub = sub i32 zeroinitializer, %x %abs = select i1 %cmp, i32 %sub, i32 %x ret i32 %abs } define <4 x i32> @abs_shifty_vec(<4 x i32> %x) { %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> %add = add <4 x i32> %signbit, %x %abs = xor <4 x i32> %signbit, %add ret <4 x i32> %abs } define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) { %cmp = icmp slt <4 x i32> %x, zeroinitializer %sub = sub <4 x i32> zeroinitializer, %x %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x ret <4 x i32> %abs } > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx > abs_shifty: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_cmpsubsel: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_shifty_vec: > vpabsd %xmm0, %xmm0 > retq > > abs_cmpsubsel_vec: > vpabsd %xmm0, %xmm0 > retq > > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64 > abs_shifty: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_cmpsubsel: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_shifty_vec: > abs v0.4s, v0.4s > ret > > abs_cmpsubsel_vec: > abs v0.4s, v0.4s > ret > > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le > abs_shifty: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_cmpsubsel: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_shifty_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > > abs_cmpsubsel_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > Differential Revision: https://reviews.llvm.org/D40984 llvm-svn: 320921
* [X86] Remove unneeded code for handling the old kunpck intrinsics.Craig Topper2017-12-162-13/+1
| | | | llvm-svn: 320917
* [LV] Extend InstWidening with CM_Widen_RecursiveHal Finkel2017-12-161-5/+16
| | | | | | | | | | | | | | | | | | | | Changes to the original scalar loop during LV code gen cause the return value of Legal->isConsecutivePtr() to be inconsistent with the return value during legal/cost phases (further analysis and information of the bug is in D39346). This patch is an alternative fix to PR34965 following the CM_Widen approach proposed by Ayal and Gil in D39346. It extends InstWidening enum with CM_Widen_Reverse to properly record the widening decision for consecutive reverse memory accesses and, consequently, get rid of the Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner solution to PR34965 than the one in D39346. Fixes PR34965. Patch by Diego Caballero, thanks! Differential Revision: https://reviews.llvm.org/D40742 llvm-svn: 320913
* [PowerPC, AsmParser] Enable the mnemonic spell correctorHal Finkel2017-12-161-2/+15
| | | | | | | | | | | r307148 added an assembly mnemonic spelling correction support and enabled it on ARM. This enables that support on PowerPC as well. Patch by Dmitry Venikov, thanks! Differential Revision: https://reviews.llvm.org/D40552 llvm-svn: 320911
* [X86] Add 128 and 256-bit VPOPCNTDQ instructions. Adjust some tablegen ↵Craig Topper2017-12-161-64/+33
| | | | | | | | classes LZCNT/POPCNT. I think when this instruction was first published it was only for a Knights CPU and thus VLX version was missing. llvm-svn: 320910
* [LTO] Make processing of combined module more consistentVitaly Buka2017-12-161-24/+12
| | | | | | | | | | | | | | | | | Summary: 1. Use stream 0 only for combined module. Previously if combined module was not processes ThinLTO used the stream for own output. However small changes in input, could trigger combined module and shuffle outputs making life of llvm::LTO harder. 2. Always process combined module and write output to stream 0. Processing empty combined module is cheap and allows llvm::LTO users to avoid implementing processing which is already done in llvm::LTO. Subscribers: mehdi_amini, inglorion, eraman, hiraditya Differential Revision: https://reviews.llvm.org/D41267 llvm-svn: 320905
* [SimplifyLibCalls] Inline calls to cabs when it's safe to do soHal Finkel2017-12-162-0/+55
| | | | | | | | | | | | When unsafe algerbra is allowed calls to cabs(r) can be replaced by: sqrt(creal(r)*creal(r) + cimag(r)*cimag(r)) Patch by Paul Walker, thanks! Differential Revision: https://reviews.llvm.org/D40069 llvm-svn: 320901
* [LV] NFC patch for moving VP*Recipe class definitions from LoopVectorize.cpp ↵Hal Finkel2017-12-163-392/+415
| | | | | | | | | | | | | | | | | | | to VPlan.h This is a small step forward to move VPlan stuff to where it should belong (i.e., VPlan.*): 1. VP*Recipe classes in LoopVectorize.cpp are moved to VPlan.h. 2. Many of VP*Recipe::print() and execute() definitions are still left in LoopVectorize.cpp since they refer to things declared in LoopVectorize.cpp. To be moved to VPlan.cpp at a later time. 3. InterleaveGroup class is moved from anonymous namespace to llvm namespace. Referencing it in anonymous namespace from VPlan.h ended up in warning. Patch by Hideki Saito, thanks! Differential Revision: https://reviews.llvm.org/D41045 llvm-svn: 320900
* [X86] Add back the assert from r320830 that was reverted in r320850Craig Topper2017-12-161-0/+2
| | | | | | Hopefully r320864 has fixed the offending case that failed the assert. llvm-svn: 320898
* Fix NDEBUG build problem in r320895Teresa Johnson2017-12-161-1/+1
| | | | | | Fix incorrect placement of #endif causing NDEBUG build failures. llvm-svn: 320897
* [ThinLTO] Enable importing of aliases as copy of aliaseeTeresa Johnson2017-12-164-29/+117
| | | | | | | | | | | | | | | | | | | | | | | Summary: This implements a missing feature to allow importing of aliases, which was previously disabled because alias cannot be available_externally. We instead import an alias as a copy of its aliasee. Some additional work was required in the IndexBitcodeWriter for the distributed build case, to ensure that the aliasee has a value id in the distributed index file (i.e. even when it is not being imported directly). This is a performance win in codes that have many aliases, e.g. C++ applications that have many constructor and destructor aliases. Reviewers: pcc Subscribers: mehdi_amini, inglorion, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D40747 llvm-svn: 320895
* Fix WebAssembly backend for some LLVM API changesDavid Blaikie2017-12-156-11/+10
| | | | llvm-svn: 320893
* Revert "Recommit "[DWARFv5] Dump an MD5 checksum in the line-table header.""Paul Robinson2017-12-153-41/+14
| | | | | | | This reverts commit 0afef672f63f0e4e91938656bc73424a8c058bfc. Still failing at runtime on bots. llvm-svn: 320888
* Recommit "[DWARFv5] Dump an MD5 checksum in the line-table header."Paul Robinson2017-12-153-14/+41
| | | | | | | | | | | Adds missing support for DW_FORM_data16. Update of r320852, fixing the unittest to use a hand-coded struct instead of std::array to guarantee data layout. Differential Revision: https://reviews.llvm.org/D41090 llvm-svn: 320886
* Fix unused variable in non-assert buildsMatthias Braun2017-12-151-2/+1
| | | | llvm-svn: 320885
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-15237-830/+816
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* MachineFunction: Slight refactoring; NFCMatthias Braun2017-12-152-16/+17
| | | | | | Slight cleanup/refactor in preparation for upcoming commit. llvm-svn: 320882
* Fixed the gcc 'enumeral and non-enumeral type in conditional expression ↵Galina Kistanova2017-12-151-3/+3
| | | | | | [-Werror=extra]' warning introduced by r320750 llvm-svn: 320868
* [Hexagon] Remove recursion in visitUsesOf, replace with use queueKrzysztof Parzyszek2017-12-152-43/+121
| | | | | | | | | | | | | | | | This is primarily to reduce stack usage, but ordering the use queue according to the position in the code (earlier instructions visited before later ones) reduces the number of unnecessary bottoms due to visiting instructions out of order, e.g. %reg1 = copy %reg0 %reg2 = copy %reg0 %reg3 = and %reg1, %reg2 Here, reg3 should be known to be same as reg0-2, but if reg3 is evaluated after reg1 is updated, but before reg2 is updated, the two inputs to the and will appear different, causing reg3 to become bottom. llvm-svn: 320866
* [Hexagon] Handle concat_vectors of all allowed HVX typesKrzysztof Parzyszek2017-12-151-10/+17
| | | | llvm-svn: 320865
* [X86] Use AND32ri8 instead of AND64ri8 in Asan code in EmitCallAsanReport ↵Craig Topper2017-12-151-1/+1
| | | | | | | | for 32-bit mode. This seemed to work due to a quirk in the X86 MC encoder that didn't emit a REX byte that the AND64ri8 implies when in 32-bit mode. This made the encoding the same as AND32ri8. I tried to add an assert to catch the dropped REX prefix that caught this. llvm-svn: 320864
* [X86] In LowerVectorCTPOP use ISD::ZERO_EXTEND/ISD::TRUNCATE instead of the ↵Craig Topper2017-12-151-4/+4
| | | | | | | | target specific nodes. The target independent nodes will get legalized to the target specific nodes by their own legalization process. Someday I'd like to stop using a target specific for zero extends and truncates of legal types so the less places we reference the target specific opcode the better. llvm-svn: 320863
* [X86] Remove unnecessary TODO.Craig Topper2017-12-151-1/+0
| | | | | | When I wrote it I thought we were missing a potential optimization for KNL. But investigating further shows that for KNL we still do the optimal thing by widening to v4f32 and then using special isel patterns to widen again to zmm a register. llvm-svn: 320862
* Re-commit : [LICM] Allow sinking when foldable in loopJun Bum Lim2017-12-151-29/+79
| | | | | | | | | | | | | | | | | | | | | | This recommits r320823 reverted due to the test failure in sink-foldable.ll and an unused variable. Added "REQUIRES: aarch64-registered-target" in the test and removed unused variable. Original commit message: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320858
* Revert "[DWARFv5] Dump an MD5 checksum in the line-table header."Paul Robinson2017-12-153-41/+14
| | | | | | Unit test fails on some bots. llvm-svn: 320857
* [Hexagon] Fix operand-swapping PatFrag for atomic storesKrzysztof Parzyszek2017-12-151-18/+22
| | | | | | | PatFrag now has the atomicity information stored as bit fields. They need to be copied to the new PatFrag. llvm-svn: 320855
* [DWARFv5] Dump an MD5 checksum in the line-table header.Paul Robinson2017-12-153-14/+41
| | | | | | | | Adds missing support for DW_FORM_data16. Differential Revision: https://reviews.llvm.org/D41090 llvm-svn: 320852
* [X86] Remove assert in X86MCCodeEmitter.cpp that was added in r320830.Craig Topper2017-12-151-2/+0
| | | | | | It seems to be failing real code which is concerning, but we were silently getting away with it. I'll investigate further. llvm-svn: 320850
* [SelectionDAG][X86] Fix insert_vector_elt lowering for v32i1/v64i1 with ↵Craig Topper2017-12-152-7/+30
| | | | | | | | | | | | | | | | | | | | | | | non-constant index Summary: Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1. We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires. For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that. For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them. Reviewers: RKSimon, delena, spatel, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40942 llvm-svn: 320849
* [Memcpy Loop Lowering] Insert loop BB inbetween the split BB.Sean Fertile2017-12-151-2/+3
| | | | | | | | | | | The original memcpy expansion inserted the loop basic block inbetween the 2 new basic blocks created by splitting the original block the memcpy call was in. This commit makes the new memcpy expansion do the same to keep the layout of the IR matching between the old and new implementations. Differential Review: https://reviews.llvm.org/D41197 llvm-svn: 320848
* [X86] Add 'Requires<[In64BitMode]>' to a bunch of instructions that only ↵Craig Topper2017-12-154-67/+94
| | | | | | | | have memory and immediate operands. The asm parser wasn't preventing these from being accepted in 32-bit mode. Instructions that use a GR64 register are protected by the parser rejecting the register in 32-bit mode. llvm-svn: 320846
* [X86] Change BNDLDX to use anymem instead of i64mem for itsmemory operand.Craig Topper2017-12-151-1/+1
| | | | | | This instruction doesn't access memory. It juse use a similar looking memory encoding. Don't require Intel syntax to put "qword ptr" in front of it. llvm-svn: 320845
* [X86] Remove the 'Requires' In64BitMode/Not64BitMode from the LWP instructions.Craig Topper2017-12-151-4/+4
| | | | | | These aren't doing anything due to a top level "let Predicates =". I think the GR32/GR64 register class protects these anyway. llvm-svn: 320844
* [X86] Remove the 'Requires<[In64BitMode]>' from SHSTK instructions.Craig Topper2017-12-151-9/+5
| | | | | | This has no effect due to a top level "let Predicates =" around the instructions. But its also not required because the GR64 usage in the instruction guarantees it can never match. llvm-svn: 320843
* [TargetLibraryInfo] fix documentation comment; NFCSanjay Patel2017-12-151-3/+3
| | | | llvm-svn: 320842
* [CodeGen] fix documentation comments; NFCSanjay Patel2017-12-151-10/+7
| | | | llvm-svn: 320840
* [AArch64] Fix typo in the ASIMD instruction optimization passEvandro Menezes2017-12-151-66/+72
| | | | | | | | Fix typo in the representative instruction replacement. Also, fix formatting and reword some comments. llvm-svn: 320839
* fix typo in comment and remove inaccurate comment; NFCSanjay Patel2017-12-151-2/+0
| | | | llvm-svn: 320838
* Fix for bug PR35549 - Repeated schedule comments.Andrew V. Tischenko2017-12-153-10/+24
| | | | | | Differential Revision: https://reviews.llvm.org/D40960 llvm-svn: 320837
* Revert "Re-commit : [LICM] Allow sinking when foldable in loop"Jun Bum Lim2017-12-151-78/+29
| | | | | | This reverts commit r320833. llvm-svn: 320836
* Re-commit : [LICM] Allow sinking when foldable in loopJun Bum Lim2017-12-151-29/+78
| | | | | | | | | | | | | | | | | | | | This recommit r320823 after fixing a test failure. Original commit message: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320833
* Updated llvm-objdump to display local relocations in Mach-O binariesMichael Trent2017-12-151-1/+23
| | | | | | | | | | | | | | | | | | | | | | | Summary: llvm-objdump's Mach-O parser was updated in r306037 to display external relocations for MH_KEXT_BUNDLE file types. This change extends the Macho-O parser to display local relocations for MH_PRELOAD files. When used with the -macho option relocations will be displayed in a historical format. All tests are passing for llvm, clang, and lld. llvm-objdump builds without compiler warnings. rdar://35778019 Reviewers: enderby Reviewed By: enderby Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41199 llvm-svn: 320832
OpenPOWER on IntegriCloud