summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
* NFC: Add a test to LV showing that reduction is not possible when reduction ↵Anna Thomas2018-08-131-0/+39
| | | | | | | | | | | | var is reset in the loop Added a test case to reduction showing where it's illegal to identify vectorize a loop. Resetting the reduction var during loop iterations disallows us from widening the dependency cycle to VF, thereby making it illegal to vectorize the loop. llvm-svn: 339605
* Revert "[LV][DebugInfo] Set DL to the middle block Icmp instruction"Anastasis Grammenos2018-07-271-33/+0
| | | | | | This reverts commit r338106. llvm-svn: 338109
* [LV][DebugInfo] Set DL to the middle block Icmp instructionAnastasis Grammenos2018-07-271-0/+33
| | | | | | | | Reviewers: hsaito Differential Revision: https://reviews.llvm.org/D49746 llvm-svn: 338106
* [LV] Fix for PR38110, LV encountered llvm_unreachable() Hideki Saito2018-07-241-0/+50
| | | | | | | | | | | | | | Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash). Reviewers: sbaranga, mssimpso, hsaito Reviewed By: hsaito Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49461 llvm-svn: 337861
* [DebugInfo][LoopVectorize] Preserve DL in induction PHI and AddAnastasis Grammenos2018-07-101-0/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D48968 llvm-svn: 336667
* Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms ↵Max Kazantsev2018-07-061-1/+2
| | | | | | are done" llvm-svn: 336410
* NFC - Various typo fixes in testsGabor Buella2018-07-041-1/+1
| | | | llvm-svn: 336268
* [DebugInfo][LoopVectorize] Preserve DL in generated phi instructionAnastasis Grammenos2018-07-041-0/+9
| | | | | | | | | When creating `phi` instructions to resume at the scalar part of the loop, copy the DebugLoc from the original phi over to the new one. Differential Revision: https://reviews.llvm.org/D48769 llvm-svn: 336256
* [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are doneMax Kazantsev2018-07-031-2/+1
| | | | | | | | | | | | This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri llvm-svn: 336172
* [AArch64] Add custom lowering for v4i8 trunc storeAdhemerval Zanella2018-06-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int *src, int width, unsigned char *dst) { for (int i = 0; i < width; i++) *dst++ = *src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735
* Revert r335513: [SCEVExp] Advance found insertion pointFlorian Hahn2018-06-251-61/+0
| | | | llvm-svn: 335522
* Force vector width for scev-expander-debug.ll testFlorian Hahn2018-06-251-1/+1
| | | | llvm-svn: 335520
* [SCEVExp] Advance found insertion point until we find a non-dbg instruction.Florian Hahn2018-06-251-0/+61
| | | | | | | | | | | | | This avoids creating unnecessary casts if the IP used to be a dbg info intrinsic. Fixes PR37727. Reviewers: vsk, aprantl, sanjoy, efriedma Reviewed By: vsk, efriedma Differential Revision: https://reviews.llvm.org/D47874 llvm-svn: 335513
* [LoopVectorize] regenerate full checks; NFCSanjay Patel2018-06-211-7/+61
| | | | llvm-svn: 335257
* Move redundant-vf2-cost.ll test to X86 directoryDiego Caballero2018-06-151-0/+0
| | | | | | | | redundant-vf2-cost.ll is X86 specific. Moved from test/Transforms/LoopVectorize/redundant-vf2-cost.ll to test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll llvm-svn: 334854
* [LV] Prevent LV to run cost model twice for VF=2Diego Caballero2018-06-151-0/+34
| | | | | | | | | | | | | | This is a minor fix for LV cost model, where the cost for VF=2 was computed twice when the vectorization of the loop was forced without specifying a VF. Reviewers: xusx595, hsaito, fhahn, mkuper Reviewed By: hsaito, xusx595 Differential Revision: https://reviews.llvm.org/D48048 llvm-svn: 334840
* [LV] Fix PR36983. For a given recurrence, fix all phis in exit blockRoman Shirokiy2018-06-081-0/+24
| | | | | | | | | There could be more than one PHIs in exit block using same loop recurrence. Don't assume there is only one and fix each user. Differential Revision: https://reviews.llvm.org/D47788 llvm-svn: 334271
* [TargetLibraryInfo] add mappings from LLVM sin/cos intrinsics to SVML callsSanjay Patel2018-06-071-4/+4
| | | | | | | | | | | | | | | These weren't included in D19544 - probably just an oversight. D40044 made it more likely that we'll have LLVM math intrinsics rather than libcalls, so this bug was more easily exposed. As the tests/code show, we already have the complete mappings for pow/exp/log. I don't have any experience with SVML, so I don't know if anything else is missing. It's also not clear to me that we should be doing this transform in IR rather than DAG/isel, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D47610 llvm-svn: 334211
* [ConstantFold] Disallow folding vector geps into bitcastsKarl-Johan Karlsson2018-06-011-3/+3
| | | | | | | | | | | | | | | | | | | Summary: Getelementptr returns a vector of pointers, instead of a single address, when one or more of its arguments is a vector. In such case it is not possible to simplify the expression by inserting a bitcast of operand(0) into the destination type, as it will create a bitcast between different sizes. Reviewers: majnemer, mkuper, mssimpso, spatel Reviewed By: spatel Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D46379 llvm-svn: 333783
* [LoopVectorize, x86] add tests to show missing SVML transforms; NFCSanjay Patel2018-05-311-344/+356
| | | | llvm-svn: 333707
* [LoopVectorize, x86] regenerate checks; NFCSanjay Patel2018-05-311-99/+403
| | | | | | I removed the 'fast' flag from the calls because that's not required. llvm-svn: 333695
* [VPlan] Reland r332654 and silence unused func warningDiego Caballero2018-05-211-0/+51
| | | | | | | | | | r332654 was reverted due to an unused function warning in release build. This commit includes the same code with the warning silenced. Differential Revision: https://reviews.llvm.org/D44338 llvm-svn: 332860
* Delete a test that was missed in the revert r332747.Amara Emerson2018-05-181-51/+0
| | | | | | r332747 originally reverted r332654 which added this test. llvm-svn: 332755
* [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part)Alexander Ivchenko2018-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | This patch aims to match the changes introduced in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The IBT feature definition is removed, with the IBT instructions being freely available on all X86 targets. The shadow stack instructions are also being made freely available, and the use of all these CET instructions is controlled by the module flags derived from the -fcf-protection clang option. The hasSHSTK option remains since clang uses it to determine availability of shadow stack instruction intrinsics, but it is no longer directly used. Comes with a clang patch (D46881). Patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D46882 llvm-svn: 332705
* [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.Diego Caballero2018-05-171-0/+51
| | | | | | | | | | | | | | | | | | | Patch #3 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). Expected to be NFC for the current inner loop vectorization path. It introduces the basic algorithm to build the VPlan plain CFG (single-level CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization path using VPInstructions. It includes: - VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now). - VPlanVerifier: Main class with utilities to check the consistency of a H-CFG. - VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan. Reviewers: rengolin, fhahn, mkuper, mssimpso, a.elovikov, hfinkel, aprantl. Differential Revision: https://reviews.llvm.org/D44338 llvm-svn: 332654
* [LV] Add lit testcase for bitcast problem. NFCKarl-Johan Karlsson2018-05-091-0/+54
| | | | llvm-svn: 331878
* [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.Shiva Chen2018-05-0918-28/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841
* [LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body ↵Hideki Saito2018-05-081-0/+42
| | | | | | | | | | | | | | | | | | | is single basic block Summary: Broadcast code generation emitted instructions in pre-header, while the instruction they are dependent on in the vector loop body. This resulted in an IL verification error ---- value used before defined. Reviewers: rengolin, fhahn, hfinkel Reviewed By: rengolin, fhahn Subscribers: dcaballe, Ka-Ka, llvm-commits Differential Revision: https://reviews.llvm.org/D46302 llvm-svn: 331799
* [LV] Move test/Transforms/LoopVectorize/pr23997.llDaniel Neilson2018-05-011-0/+0
| | | | | | | | | | | | | Summary: This fixes a build break with r331269. test/Transforms/LoopVectorize/pr23997.ll should be in: test/Transforms/LoopVectorize/X86/pr23997.ll llvm-svn: 331281
* [LV] Preserve inbounds on created GEPsDaniel Neilson2018-05-0111-251/+360
| | | | | | | | | | | | | | | | | | | Summary: This is a fix for PR23997. The loop vectorizer is not preserving the inbounds property of GEPs that it creates. This is inhibiting some optimizations. This patch preserves the inbounds property in the case where a load/store is being fed by an inbounds GEP. Reviewers: mkuper, javed.absar, hsaito Reviewed By: hsaito Subscribers: dcaballe, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D46191 llvm-svn: 331269
* [LV][VPlan] Detect outer loops for explicit vectorization.Diego Caballero2018-04-243-0/+553
| | | | | | | | | | | | | | | | | Patch #2 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces the basic infrastructure to detect, legality check and process outer loops annotated with hints for explicit vectorization. All these changes are protected under the feature flag -enable-vplan-native-path. This should make this patch NFC for the existing inner loop vectorizer. Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso. Differential Revision: https://reviews.llvm.org/D42447 llvm-svn: 330739
* [LV] Introduce TTI::getMinimumVFKrzysztof Parzyszek2018-04-132-0/+175
| | | | | | | | | | | The function getMinimumVF(ElemWidth) will return the minimum VF for a vector with elements of size ElemWidth bits. This value will only apply to targets for which TTI::shouldMaximizeVectorBandwidth returns true. The value of 0 indicates that there is no minimum VF. Differential Revision: https://reviews.llvm.org/D45271 llvm-svn: 330062
* [InstCombine] reassociate loop invariant GEP chains to enable LICMSebastian Pop2018-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This change brings performance of zlib up by 10%. The example below is from a hot loop in longest_match() from zlib. do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1 In this example %idx.ext1 is a loop invariant. It will be moved above the use of loop induction variable %idx.ext such that it can be hoisted out of the loop by LICM. The operands that have dependences carried by the loop will be sinked down in the GEP chain. This patch will produce the following output: do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1 %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext llvm-svn: 328539
* Revert r325687 (workaround for PR36032).Evgeny Stupachenko2018-03-224-5/+5
| | | | | | | | | | | | | | Summary: Revert r325687 workaround for PR36032 since a fix was committed in r326154. Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D44768 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 328257
* [LV] Let recordVectorLoopValueForInductionCast to check if IV was created ↵Andrei Elovikov2018-03-201-0/+39
| | | | | | | | | | | | | | | | | | | | | | from the cast. Summary: It turned out to be error-prone to expect the callers to handle that - better to leave the decision to this routine and make the required data to be explicitly passed to the function. This handles the case that was missed in the r322473 and fixes the assert mentioned in PR36524. Reviewers: dorit, mssimpso, Ayal, dcaballe Reviewed By: dcaballe Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D43812 llvm-svn: 327960
* [LV] Adding test for r327109Renato Golin2018-03-091-0/+49
| | | | llvm-svn: 327155
* [SCEV] Smart range calculation for SCEVUnknown PhisMax Kazantsev2018-03-011-3/+43
| | | | | | | | | | The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN` can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D43810 llvm-svn: 326418
* [ARM] add loop vectorizer test based on 482.sphinx3 from SPEC2006; NFCSanjay Patel2018-02-271-0/+165
| | | | | | | | This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326221
* Make test agnostic to cost modelAdam Nemet2018-02-271-1/+1
| | | | | | This was causing bot failures on greendragon llvm-svn: 326169
* Fix r326154 buildbots test failEvgeny Stupachenko2018-02-272-3/+2
| | | | | | | | | | Summary: Add specific mtriples to tests added in r326154. From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326158
* Fix PR36032, PR35432Evgeny Stupachenko2018-02-272-0/+367
| | | | | | | | | | | | | | | Summary: The change fix an assert fail at ScalarEvolutionExpander.cpp: assert(ExitCount != SE.getCouldNotCompute() && "Invalid loop count"); Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D42604 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326154
* [LV] Move isLegalMasked* functions from Legality to CostModelRenato Golin2018-02-262-2/+3
| | | | | | | | | | | | | | | | | | | | | | | All SIMD architectures can emulate masked load/store/gather/scatter through element-wise condition check, scalar load/store, and insert/extract. Therefore, bailing out of vectorization as legality failure, when they return false, is incorrect. We should proceed to cost model and determine profitability. This patch is to address the vectorizer's architectural limitation described above. As such, I tried to keep the cost model and vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning should be done separately. Please see http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for RFC and the discussions. Closes D43208. Patch by: Hideki Saito <hideki.saito@intel.com> llvm-svn: 326079
* [LV] Fix test checks, NFCAlexey Bataev2018-02-211-76/+2363
| | | | llvm-svn: 325699
* [SCEV] Temporarily disable loop versioning for the purposeSilviu Baranga2018-02-213-4/+4
| | | | | | | | | | of turning SCEVUnknowns of PHIs into AddRecExprs. This feature is now hidden behind the -scev-version-unknown flag. Fixes PR36032 and PR35432. llvm-svn: 325687
* revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)Sanjay Patel2018-02-211-9/+36
| | | | | | | | There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658
* [LV] Fix test checks, NFC.Alexey Bataev2018-02-202-140/+3506
| | | | llvm-svn: 325617
* [TTI CostModel] change default cost of FP ops to 1 (PR36280)Sanjay Patel2018-02-191-36/+9
| | | | | | | | | | | | | | | | | | This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515
* [LV] Fix analyzeInterleaving when -pass-remarks enabledMircea Trofin2018-02-101-0/+43
| | | | | | | | | | | | | | | | | | | | | | | Summary: If -pass-remarks=loop-vectorize, atomic ops will be seen by analyzeInterleaving(), even though canVectorizeMemory() == false. This is because we are requesting extra analysis instead of bailing out. In such a case, we end up with a Group in both Load- and StoreGroups, and then we'll try to access freed memory when traversing LoadGroups after having had released the Group when iterating over StoreGroups. The fix is to include mayWriteToMemory() when validating that two instructions are the same kind of memory operation. Reviewers: mssimpso, davidxl Reviewed By: davidxl Subscribers: hsaito, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D43064 llvm-svn: 324786
* [LoopVectorize] auto-generate complete checks; NFCSanjay Patel2018-02-081-5/+80
| | | | llvm-svn: 324611
* Verify profile data confirms large loop trip counts.Mircea Trofin2018-02-071-1/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Loops with inequality comparers, such as: // unsigned bound for (unsigned i = 1; i < bound; ++i) {...} have getSmallConstantMaxTripCount report a large maximum static trip count - in this case, 0xffff fffe. However, profiling info may show that the trip count is much smaller, and thus counter-recommend vectorization. This change: - flips loop-vectorize-with-block-frequency on by default. - validates profiled loop frequency data supports vectorization, when static info appears to not counter-recommend it. Absence of profile data means we rely on static data, just as we've done so far. Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal Reviewed By: davidxl Subscribers: bkramer, llvm-commits Differential Revision: https://reviews.llvm.org/D42946 llvm-svn: 324543
OpenPOWER on IntegriCloud