summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
* Revert "[SCEV][NFC] Check NoWrap flags before lexicographical comparison of ↵Roman Tereshin2018-08-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SCEVs" This reverts r319889. Unfortunately, wrapping flags are not a part of SCEV's identity (they do not participate in computing a hash value or in equality comparisons) and in fact they could be assigned after the fact w/o rebuilding a SCEV. Grep for const_cast's to see quite a few of examples, apparently all for AddRec's at the moment. So, if 2 expressions get built in 2 slightly different ways: one with flags set in the beginning, the other with the flags attached later on, we may end up with 2 expressions which are exactly the same but have their operands swapped in one of the commutative N-ary expressions, and at least one of them will have "sorted by complexity" invariant broken. 2 identical SCEV's won't compare equal by pointer comparison as they are supposed to. A real-world reproducer is added as a regression test: the issue described causes 2 identical SCEV expressions to have different order of operands and therefore compare not equal, which in its turn prevents LoadStoreVectorizer from vectorizing a pair of consecutive loads. On a larger example (the source of the test attached, which is a bugpoint) I have seen even weirder behavior: adding a constant to an existing SCEV changes the order of the existing terms, for instance, getAddExpr(1, ((A * B) + (C * D))) returns (1 + (C * D) + (A * B)). Differential Revision: https://reviews.llvm.org/D40645 llvm-svn: 340777
* Avoid dbg.value use-before-def in a few tests (NFC)Vedant Kumar2018-08-211-6/+6
| | | | | | | | | | This is preparation for landing a use-before-def verifier for debug intrinsics (D46100). As a drive-by, remove `tail` from debug intrinsic calls because it doesn't mean anything in that context. llvm-svn: 340366
* NFC: update the test comments in LV test about early exit loopsAnna Thomas2018-08-211-2/+14
| | | | llvm-svn: 340337
* [LV] Vectorize loops where non-phi instructions used outside loopAnna Thomas2018-08-211-0/+147
| | | | | | | | | | | | | | | | | | | Summary: Follow up change to rL339703, where we now vectorize loops with non-phi instructions used outside the loop. Note that the cyclic dependency identification occurs when identifying reduction/induction vars. We also need to identify that we do not allow users where the PSCEV information within and outside the loop are different. This was the fix added in rL307837 for PR33706. Reviewers: Ayal, mkuper, fhahn Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D50778 llvm-svn: 340278
* NFC: Add loop vectorizer tests showing various control flow within loop that ↵Anna Thomas2018-08-211-0/+169
| | | | | | skip iterations llvm-svn: 340275
* add a missed case for binary op FMF propagation under select foldsMichael Berg2018-08-161-1/+1
| | | | llvm-svn: 339938
* [LV] Teach about non header phis that have uses outside the loopAnna Thomas2018-08-141-6/+231
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch teaches the loop vectorizer to vectorize loops with non header phis that have have outside uses. This is because the iteration dependence distance for these phis can be widened upto VF (similar to how we do for induction/reduction) if they do not have a cyclic dependence with header phis. When identifying reduction/induction/first order recurrence header phis, we already identify if there are any cyclic dependencies that prevents vectorization. The vectorizer is taught to extract the last element from the vectorized phi and update the scalar loop exit block phi to contain this extracted element from the vector loop. This patch can be extended to vectorize loops where instructions other than phis have outside uses. Reviewers: Ayal, mkuper, mssimpso, efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50579 llvm-svn: 339703
* NFC: Add a test to LV showing that reduction is not possible when reduction ↵Anna Thomas2018-08-131-0/+39
| | | | | | | | | | | | var is reset in the loop Added a test case to reduction showing where it's illegal to identify vectorize a loop. Resetting the reduction var during loop iterations disallows us from widening the dependency cycle to VF, thereby making it illegal to vectorize the loop. llvm-svn: 339605
* Revert "[LV][DebugInfo] Set DL to the middle block Icmp instruction"Anastasis Grammenos2018-07-271-33/+0
| | | | | | This reverts commit r338106. llvm-svn: 338109
* [LV][DebugInfo] Set DL to the middle block Icmp instructionAnastasis Grammenos2018-07-271-0/+33
| | | | | | | | Reviewers: hsaito Differential Revision: https://reviews.llvm.org/D49746 llvm-svn: 338106
* [LV] Fix for PR38110, LV encountered llvm_unreachable() Hideki Saito2018-07-241-0/+50
| | | | | | | | | | | | | | Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash). Reviewers: sbaranga, mssimpso, hsaito Reviewed By: hsaito Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49461 llvm-svn: 337861
* [DebugInfo][LoopVectorize] Preserve DL in induction PHI and AddAnastasis Grammenos2018-07-101-0/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D48968 llvm-svn: 336667
* Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms ↵Max Kazantsev2018-07-061-1/+2
| | | | | | are done" llvm-svn: 336410
* NFC - Various typo fixes in testsGabor Buella2018-07-041-1/+1
| | | | llvm-svn: 336268
* [DebugInfo][LoopVectorize] Preserve DL in generated phi instructionAnastasis Grammenos2018-07-041-0/+9
| | | | | | | | | When creating `phi` instructions to resume at the scalar part of the loop, copy the DebugLoc from the original phi over to the new one. Differential Revision: https://reviews.llvm.org/D48769 llvm-svn: 336256
* [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are doneMax Kazantsev2018-07-031-2/+1
| | | | | | | | | | | | This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri llvm-svn: 336172
* [AArch64] Add custom lowering for v4i8 trunc storeAdhemerval Zanella2018-06-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int *src, int width, unsigned char *dst) { for (int i = 0; i < width; i++) *dst++ = *src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735
* Revert r335513: [SCEVExp] Advance found insertion pointFlorian Hahn2018-06-251-61/+0
| | | | llvm-svn: 335522
* Force vector width for scev-expander-debug.ll testFlorian Hahn2018-06-251-1/+1
| | | | llvm-svn: 335520
* [SCEVExp] Advance found insertion point until we find a non-dbg instruction.Florian Hahn2018-06-251-0/+61
| | | | | | | | | | | | | This avoids creating unnecessary casts if the IP used to be a dbg info intrinsic. Fixes PR37727. Reviewers: vsk, aprantl, sanjoy, efriedma Reviewed By: vsk, efriedma Differential Revision: https://reviews.llvm.org/D47874 llvm-svn: 335513
* [LoopVectorize] regenerate full checks; NFCSanjay Patel2018-06-211-7/+61
| | | | llvm-svn: 335257
* Move redundant-vf2-cost.ll test to X86 directoryDiego Caballero2018-06-151-0/+0
| | | | | | | | redundant-vf2-cost.ll is X86 specific. Moved from test/Transforms/LoopVectorize/redundant-vf2-cost.ll to test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll llvm-svn: 334854
* [LV] Prevent LV to run cost model twice for VF=2Diego Caballero2018-06-151-0/+34
| | | | | | | | | | | | | | This is a minor fix for LV cost model, where the cost for VF=2 was computed twice when the vectorization of the loop was forced without specifying a VF. Reviewers: xusx595, hsaito, fhahn, mkuper Reviewed By: hsaito, xusx595 Differential Revision: https://reviews.llvm.org/D48048 llvm-svn: 334840
* [LV] Fix PR36983. For a given recurrence, fix all phis in exit blockRoman Shirokiy2018-06-081-0/+24
| | | | | | | | | There could be more than one PHIs in exit block using same loop recurrence. Don't assume there is only one and fix each user. Differential Revision: https://reviews.llvm.org/D47788 llvm-svn: 334271
* [TargetLibraryInfo] add mappings from LLVM sin/cos intrinsics to SVML callsSanjay Patel2018-06-071-4/+4
| | | | | | | | | | | | | | | These weren't included in D19544 - probably just an oversight. D40044 made it more likely that we'll have LLVM math intrinsics rather than libcalls, so this bug was more easily exposed. As the tests/code show, we already have the complete mappings for pow/exp/log. I don't have any experience with SVML, so I don't know if anything else is missing. It's also not clear to me that we should be doing this transform in IR rather than DAG/isel, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D47610 llvm-svn: 334211
* [ConstantFold] Disallow folding vector geps into bitcastsKarl-Johan Karlsson2018-06-011-3/+3
| | | | | | | | | | | | | | | | | | | Summary: Getelementptr returns a vector of pointers, instead of a single address, when one or more of its arguments is a vector. In such case it is not possible to simplify the expression by inserting a bitcast of operand(0) into the destination type, as it will create a bitcast between different sizes. Reviewers: majnemer, mkuper, mssimpso, spatel Reviewed By: spatel Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D46379 llvm-svn: 333783
* [LoopVectorize, x86] add tests to show missing SVML transforms; NFCSanjay Patel2018-05-311-344/+356
| | | | llvm-svn: 333707
* [LoopVectorize, x86] regenerate checks; NFCSanjay Patel2018-05-311-99/+403
| | | | | | I removed the 'fast' flag from the calls because that's not required. llvm-svn: 333695
* [VPlan] Reland r332654 and silence unused func warningDiego Caballero2018-05-211-0/+51
| | | | | | | | | | r332654 was reverted due to an unused function warning in release build. This commit includes the same code with the warning silenced. Differential Revision: https://reviews.llvm.org/D44338 llvm-svn: 332860
* Delete a test that was missed in the revert r332747.Amara Emerson2018-05-181-51/+0
| | | | | | r332747 originally reverted r332654 which added this test. llvm-svn: 332755
* [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part)Alexander Ivchenko2018-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | This patch aims to match the changes introduced in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The IBT feature definition is removed, with the IBT instructions being freely available on all X86 targets. The shadow stack instructions are also being made freely available, and the use of all these CET instructions is controlled by the module flags derived from the -fcf-protection clang option. The hasSHSTK option remains since clang uses it to determine availability of shadow stack instruction intrinsics, but it is no longer directly used. Comes with a clang patch (D46881). Patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D46882 llvm-svn: 332705
* [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.Diego Caballero2018-05-171-0/+51
| | | | | | | | | | | | | | | | | | | Patch #3 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). Expected to be NFC for the current inner loop vectorization path. It introduces the basic algorithm to build the VPlan plain CFG (single-level CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization path using VPInstructions. It includes: - VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now). - VPlanVerifier: Main class with utilities to check the consistency of a H-CFG. - VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan. Reviewers: rengolin, fhahn, mkuper, mssimpso, a.elovikov, hfinkel, aprantl. Differential Revision: https://reviews.llvm.org/D44338 llvm-svn: 332654
* [LV] Add lit testcase for bitcast problem. NFCKarl-Johan Karlsson2018-05-091-0/+54
| | | | llvm-svn: 331878
* [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.Shiva Chen2018-05-0918-28/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841
* [LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body ↵Hideki Saito2018-05-081-0/+42
| | | | | | | | | | | | | | | | | | | is single basic block Summary: Broadcast code generation emitted instructions in pre-header, while the instruction they are dependent on in the vector loop body. This resulted in an IL verification error ---- value used before defined. Reviewers: rengolin, fhahn, hfinkel Reviewed By: rengolin, fhahn Subscribers: dcaballe, Ka-Ka, llvm-commits Differential Revision: https://reviews.llvm.org/D46302 llvm-svn: 331799
* [LV] Move test/Transforms/LoopVectorize/pr23997.llDaniel Neilson2018-05-011-0/+0
| | | | | | | | | | | | | Summary: This fixes a build break with r331269. test/Transforms/LoopVectorize/pr23997.ll should be in: test/Transforms/LoopVectorize/X86/pr23997.ll llvm-svn: 331281
* [LV] Preserve inbounds on created GEPsDaniel Neilson2018-05-0111-251/+360
| | | | | | | | | | | | | | | | | | | Summary: This is a fix for PR23997. The loop vectorizer is not preserving the inbounds property of GEPs that it creates. This is inhibiting some optimizations. This patch preserves the inbounds property in the case where a load/store is being fed by an inbounds GEP. Reviewers: mkuper, javed.absar, hsaito Reviewed By: hsaito Subscribers: dcaballe, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D46191 llvm-svn: 331269
* [LV][VPlan] Detect outer loops for explicit vectorization.Diego Caballero2018-04-243-0/+553
| | | | | | | | | | | | | | | | | Patch #2 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces the basic infrastructure to detect, legality check and process outer loops annotated with hints for explicit vectorization. All these changes are protected under the feature flag -enable-vplan-native-path. This should make this patch NFC for the existing inner loop vectorizer. Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso. Differential Revision: https://reviews.llvm.org/D42447 llvm-svn: 330739
* [LV] Introduce TTI::getMinimumVFKrzysztof Parzyszek2018-04-132-0/+175
| | | | | | | | | | | The function getMinimumVF(ElemWidth) will return the minimum VF for a vector with elements of size ElemWidth bits. This value will only apply to targets for which TTI::shouldMaximizeVectorBandwidth returns true. The value of 0 indicates that there is no minimum VF. Differential Revision: https://reviews.llvm.org/D45271 llvm-svn: 330062
* [InstCombine] reassociate loop invariant GEP chains to enable LICMSebastian Pop2018-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This change brings performance of zlib up by 10%. The example below is from a hot loop in longest_match() from zlib. do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1 In this example %idx.ext1 is a loop invariant. It will be moved above the use of loop induction variable %idx.ext such that it can be hoisted out of the loop by LICM. The operands that have dependences carried by the loop will be sinked down in the GEP chain. This patch will produce the following output: do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1 %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext llvm-svn: 328539
* Revert r325687 (workaround for PR36032).Evgeny Stupachenko2018-03-224-5/+5
| | | | | | | | | | | | | | Summary: Revert r325687 workaround for PR36032 since a fix was committed in r326154. Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D44768 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 328257
* [LV] Let recordVectorLoopValueForInductionCast to check if IV was created ↵Andrei Elovikov2018-03-201-0/+39
| | | | | | | | | | | | | | | | | | | | | | from the cast. Summary: It turned out to be error-prone to expect the callers to handle that - better to leave the decision to this routine and make the required data to be explicitly passed to the function. This handles the case that was missed in the r322473 and fixes the assert mentioned in PR36524. Reviewers: dorit, mssimpso, Ayal, dcaballe Reviewed By: dcaballe Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D43812 llvm-svn: 327960
* [LV] Adding test for r327109Renato Golin2018-03-091-0/+49
| | | | llvm-svn: 327155
* [SCEV] Smart range calculation for SCEVUnknown PhisMax Kazantsev2018-03-011-3/+43
| | | | | | | | | | The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN` can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D43810 llvm-svn: 326418
* [ARM] add loop vectorizer test based on 482.sphinx3 from SPEC2006; NFCSanjay Patel2018-02-271-0/+165
| | | | | | | | This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326221
* Make test agnostic to cost modelAdam Nemet2018-02-271-1/+1
| | | | | | This was causing bot failures on greendragon llvm-svn: 326169
* Fix r326154 buildbots test failEvgeny Stupachenko2018-02-272-3/+2
| | | | | | | | | | Summary: Add specific mtriples to tests added in r326154. From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326158
* Fix PR36032, PR35432Evgeny Stupachenko2018-02-272-0/+367
| | | | | | | | | | | | | | | Summary: The change fix an assert fail at ScalarEvolutionExpander.cpp: assert(ExitCount != SE.getCouldNotCompute() && "Invalid loop count"); Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D42604 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326154
* [LV] Move isLegalMasked* functions from Legality to CostModelRenato Golin2018-02-262-2/+3
| | | | | | | | | | | | | | | | | | | | | | | All SIMD architectures can emulate masked load/store/gather/scatter through element-wise condition check, scalar load/store, and insert/extract. Therefore, bailing out of vectorization as legality failure, when they return false, is incorrect. We should proceed to cost model and determine profitability. This patch is to address the vectorizer's architectural limitation described above. As such, I tried to keep the cost model and vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning should be done separately. Please see http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for RFC and the discussions. Closes D43208. Patch by: Hideki Saito <hideki.saito@intel.com> llvm-svn: 326079
* [LV] Fix test checks, NFCAlexey Bataev2018-02-211-76/+2363
| | | | llvm-svn: 325699
OpenPOWER on IntegriCloud