summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove a duplicate testPhilip Reames2019-09-121-320/+14
| | | | | | Turns out I'd already added exactly the same test under the name non_unit_stride. llvm-svn: 371777
* [LV] Update test case after r371768.Florian Hahn2019-09-121-15/+143
| | | | llvm-svn: 371769
* Precommit tests for generalization of load dereferenceability in loopPhilip Reames2019-09-121-10/+717
| | | | llvm-svn: 371747
* [LV] Support invariant addresses in speculation logicPhilip Reames2019-09-121-153/+57
| | | | | | | | | | Implement a TODO from rL371452, and handle loop invariant addresses in predicated blocks. If we can prove that the load is safe to speculate into the header, then we can avoid using a masked.load in favour of a normal load. This is mostly about vectorization robustness. In the common case, it's generally expected that LICM/LoadStorePromotion would have eliminated such loads entirely. Differential Revision: https://reviews.llvm.org/D67372 llvm-svn: 371745
* [Tests] Fix a typo in a testPhilip Reames2019-09-091-83/+96
| | | | llvm-svn: 371456
* [Tests] Precommit test case for D67372Philip Reames2019-09-091-10/+302
| | | | llvm-svn: 371455
* [LoopVectorize] Leverage speculation safety to avoid masked.loadsPhilip Reames2019-09-092-18/+18
| | | | | | | | | | | | If we're vectorizing a load in a predicated block, check to see if the load can be speculated rather than predicated. This allows us to generate a normal vector load instead of a masked.load. To do so, we must prove that all bytes accessed on any iteration of the original loop are dereferenceable, and that all loads (across all iterations) are properly aligned. This is equivelent to proving that hoisting the load into the loop header in the original scalar loop is safe. Note: There are a couple of code motion todos in the code. My intention is to wait about a day - to be sure this sticks - and then perform the NFC motion without furthe review. Differential Revision: https://reviews.llvm.org/D66688 llvm-svn: 371452
* [X86] Replace -mcpu with -mattr on some tests.Craig Topper2019-09-061-1/+1
| | | | llvm-svn: 371260
* [LV] Fix miscompiles by adding non-header PHI nodes to AllowedExitBjorn Pettersson2019-09-031-91/+21
| | | | | | | | | | | | | | | | | | | | | | | Summary: Fold-tail currently supports reduction last-vector-value live-out's, but has yet to support last-scalar-value live-outs, including non-header phi's. As it relies on AllowedExit in order to detect them and bail out we need to add the non-header PHI nodes to AllowedExit, otherwise we end up with miscompiles. Solves https://bugs.llvm.org/show_bug.cgi?id=43166 Reviewers: fhahn, Ayal Reviewed By: fhahn, Ayal Subscribers: anna, hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67074 llvm-svn: 370721
* [LV] Precommit test case showing miscompile from PR43166. NFCBjorn Pettersson2019-09-031-0/+235
| | | | | | | | | | | | | | | | Summary: Precommit test case showing miscompile from PR43166. Reviewers: fhahn, Ayal Reviewed By: fhahn Subscribers: rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67072 llvm-svn: 370720
* [LV] Fold tail by masking - handle reductionsAyal Zaks2019-08-281-0/+56
| | | | | | | | | | | | Allow vectorizing loops that have reductions when tail is folded by masking. A select is introduced in VPlan, choosing between the last value carried by the loop-exit/live-out instruction of the reduction, and the penultimate value carried by the reduction phi, according to the "i < n" mask of fold-tail. This select replaces the last value as the live-out value of the loop. Differential Revision: https://reviews.llvm.org/D66720 llvm-svn: 370173
* Preland test cases for D66688 to make diffs clear.Philip Reames2019-08-261-0/+1390
| | | | llvm-svn: 369959
* [ARM] Don't pretend we know how to generate MVE VLDnDavid Green2019-08-161-0/+416
| | | | | | | | | | We don't yet know how to generate these instructions for MVE. And in the case of VLD3, we don't even have the instruction. For the moment don't tell the vectoriser that we have VLD4, just to end up serialising the results. Differential Revision: https://reviews.llvm.org/D66009 llvm-svn: 369101
* [LV] fold-tail predication should be respected even with assume_safetyDorit Nuzman2019-08-152-11/+177
| | | | | | | | | | | | | | | assume_safety implies that loads under "if's" can be safely executed speculatively (unguarded, unmasked). However this assumption holds only for the original user "if's", not those introduced by the compiler, such as the fold-tail "if" that guards us from loading beyond the original loop trip-count. Currently the combination of fold-tail and assume-safety pragmas results in ignoring the fold-tail predicate that guards the loads, generating unmasked loads. This patch fixes this behavior. Differential Revision: https://reviews.llvm.org/D66106 Reviewers: Ayal, hsaito, fhahn llvm-svn: 368973
* [LV] Fold-tail flagDorit Nuzman2019-08-141-1/+19
| | | | | | | | | | | This is the compiler-flag equivalent of the Predicate pragma (https://reviews.llvm.org/D65197), to direct the vectorizer to fold the remainder-loop into the main-loop using predication. Differential Revision: https://reviews.llvm.org/D66108 Reviewers: Ayal, hsaito, fhahn, SjoerdMeije llvm-svn: 368801
* [ARM] Permit auto-vectorization using MVEDavid Green2019-08-111-0/+5
| | | | | | | | | | | | | With enough codegen complete, we can now correctly report the number and size of vector registers for MVE, allowing auto vectorisation. This also allows FP auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE FP arithmetic and parity between scalar and vector FP behaviour. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63728 llvm-svn: 368529
* [LoopVectorize][X86] Clamp interleave factor if we have a known constant ↵Craig Topper2019-08-071-15/+35
| | | | | | | | | | | | trip count that is less than VF*interleave If we know the trip count, we should make sure the interleave factor won't cause the vectorized loop to exceed it. Improves one of the cases from PR42674 Differential Revision: https://reviews.llvm.org/D65896 llvm-svn: 368215
* [LoopVectorize][X86] Add test case for missed vectorization from PR42674.Craig Topper2019-08-071-0/+41
| | | | | | | We do end vectorizing the code, but use an interleave factor that is too high and causes the vector code to be dead. llvm-svn: 368197
* [LV][NFC] Share the LV illegality reporting with LoopVectorize.Hideki Saito2019-08-061-0/+27
| | | | | | | | | | | | Reviewers: hsaito, fhahn, rengolin Reviewed By: rengolin Patch by psamolysov, thanks! Differential Revision: https://reviews.llvm.org/D62997 llvm-svn: 367980
* [LV] Fix test failure in a Release build.Jay Foad2019-08-021-0/+1
| | | | llvm-svn: 367666
* Moves the newly added test interleaved-accesses-waw-dependency.ll to X86 ↵Hideki Saito2019-08-021-0/+0
| | | | | | | | subdirectory. ps4-buildslave1 reported a failure. The test has x86 triple. llvm-svn: 367659
* [LV] Avoid building interleaved group in presence of WAW dependencyHideki Saito2019-08-021-0/+109
| | | | | | | | | | | | Reviewers: hsaito, Ayal, fhahn, anna, mkazantsev Reviewed By: hsaito Patch by evrevnov, thanks! Differential Revision: https://reviews.llvm.org/D63981 llvm-svn: 367654
* [LV] Tail-Loop FoldingSjoerd Meijer2019-08-011-0/+78
| | | | | | | | | | | This allows folding of the scalar epilogue loop (the tail) into the main vectorised loop body when the loop is annotated with a "vector predicate" metadata hint. To fold the tail, instructions need to be predicated (masked), enabling/disabling lanes for the remainder iterations. Differential Revision: https://reviews.llvm.org/D65197 llvm-svn: 367592
* [LoopVectorize] Pass unfiltered list of arguments to getIntrinsicInstCost.Florian Hahn2019-07-151-0/+30
| | | | | | | We do not compute the scalarization overhead in getVectorIntrinsicCost and TTI::getIntrinsicInstrCost requires the full arguments list. llvm-svn: 366049
* [LV] Exclude loop-invariant inputs from scalar cost computation.Florian Hahn2019-07-141-0/+109
| | | | | | | | | | | | | | | | Loop invariant operands do not need to be scalarized, as we are using the values outside the loop. We should ignore them when computing the scalarization overhead. Fixes PR41294 Reviewers: hsaito, rengolin, dcaballe, Ayal Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D59995 llvm-svn: 366030
* Revert "[IRBuilder] Fold consistently for or/and whether constant is LHS or RHS"Petr Hosek2019-07-073-41/+45
| | | | | | | | | | This reverts commit r365260 which broke the following tests: Clang :: CodeGenCXX/cfi-mfcall.cpp Clang :: CodeGenObjC/ubsan-nullability.m LLVM :: Transforms/LoopVectorize/AArch64/pr36032.ll llvm-svn: 365284
* [IRBuilder] Fold consistently for or/and whether constant is LHS or RHSPhilip Reames2019-07-063-45/+41
| | | | | | Without this, we have the unfortunate property that tests are dependent on the order of operads passed the CreateOr and CreateAnd functions. In actual usage, we'd promptly optimize them away, but it made tests slightly more verbose than they should have been. llvm-svn: 365260
* [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through ↵Orlando Cazalet-Hyams2019-06-195-12/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 > llvm-svn: 363046 llvm-svn: 363786
* [LV] Suppress vectorization in some nontemporal casesWarren Ristow2019-06-172-5/+117
| | | | | | | | | | | | | | | | | | | | | When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type *DataType, unsigned Alignment) const; bool isLegalNTLoad(Type *DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581
* [LV] Deny irregular types in interleavedAccessCanBeWidenedBjorn Pettersson2019-06-171-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Avoid that loop vectorizer creates loads/stores of vectors with "irregular" types when interleaving. An example of an irregular type is x86_fp80 that is 80 bits, but that may have an allocation size that is 96 bits. So an array of x86_fp80 is not bitcast compatible with a vector of the same type. Not sure if interleavedAccessCanBeWidened is the best place for this check, but it solves the problem seen in the added test case. And it is the same kind of check that already exists in memoryInstructionCanBeWidened. Reviewers: fhahn, Ayal, craig.topper Reviewed By: fhahn Subscribers: hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63386 llvm-svn: 363547
* [lit] Delete empty lines at the end of lit.local.cfg NFCFangrui Song2019-06-174-4/+0
| | | | llvm-svn: 363538
* [SCEV] Pass NoWrapFlags when expanding an AddExprSam Parker2019-06-141-1/+1
| | | | | | | | | | | | InsertBinop now accepts NoWrapFlags, so pass them through when expanding a simple add expression. This is the first re-commit of the functional changes from rL362687, which was previously reverted. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363364
* Improve reduction intrinsics by overloading result value.Sander de Smalen2019-06-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses the mechanism from D62995 to strengthen the definitions of the reduction intrinsics by letting the scalar result/accumulator type be overloaded from the vector element type. For example: ; The LLVM LangRef specifies that the scalar result must equal the ; vector element type, but this is not checked/enforced by LLVM. declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) This patch changes that into: declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a) Which has the type-constraint more explicit and causes LLVM to check the result type with the vector element type. Reviewers: RKSimon, arsenm, rnk, greened, aemerson Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62996 llvm-svn: 363240
* LoopDistribute/LAA: Add tests to catch regressionsMatt Arsenault2019-06-122-0/+53
| | | | | | | | | I broke 2 of these with a patch, but were not covered by existing tests. https://reviews.llvm.org/D63035 llvm-svn: 363158
* Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step ↵Orlando Cazalet-Hyams2019-06-125-102/+12
| | | | | | | | | through loop even after completion" This reverts commit 1a0f7a2077b70c9864faa476e15b048686cf1ca7. See phabricator thread for D60831. llvm-svn: 363132
* [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through ↵Orlando Cazalet-Hyams2019-06-115-12/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 363046
* Revert "[SCEV] Use wrap flags in InsertBinop"Benjamin Kramer2019-06-062-2/+2
| | | | | | This reverts commit r362687. Miscompiles llvm-profdata during selfhost. llvm-svn: 362699
* [SCEV] Use wrap flags in InsertBinopSam Parker2019-06-062-2/+2
| | | | | | | | | | If the given SCEVExpr has no (un)signed flags attached to it, transfer these to the resulting instruction or use them to find an existing instruction. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 362687
* Initial support for IBM MASS vector libraryNemanja Ivanovic2019-06-054-0/+1795
| | | | | | | This is the LLVM portion of patch https://reviews.llvm.org/D59881. The clang portion is to follow. llvm-svn: 362568
* [CostModel][X86] Improve masked load/store AVX1/AVX2 costsSimon Pilgrim2019-06-021-532/+890
| | | | | | | | | | | | | | | | | | | | A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338
* [LoopVectorize] Add FNeg instruction supportCraig Topper2019-05-302-16/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D62510 llvm-svn: 362124
* [LoopVectorize] Precommit tests for D62510. NFCCraig Topper2019-05-302-0/+61
| | | | llvm-svn: 362060
* [LV] Inform about exactly reason of loop illegalityFlorian Hahn2019-05-302-27/+82
| | | | | | | | | | | | | | | | | | | | | | | Currently, only the following information is provided by LoopVectorizer in the case when the CF of the loop is not legal for vectorization: LV: Can't vectorize the instructions or CFG LV: Not vectorizing: Cannot prove legality. But this information is not enough for the root cause analysis; what is exactly wrong with the loop should also be printed: LV: Not vectorizing: The exiting block is not the loop latch. Patch by Pavel Samolysov. Reviewers: mkuper, hsaito, rengolin, fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D62311 llvm-svn: 362056
* [SimplifyCFG] Added condition assumption for unreachable blocksDavid Bolvansky2019-05-251-0/+10
| | | | | | | | | | | | | | | | Summary: PR41688 Reviewers: spatel, efriedma, craig.topper, hfinkel, reames Reviewed By: hfinkel Subscribers: javed.absar, dmgreen, fhahn, hfinkel, reames, nikic, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61409 llvm-svn: 361707
* [LoopVectorize] Fix test by regenerating checksNikita Popov2019-05-251-10/+0
| | | | llvm-svn: 361699
* [NFC] Make tests more robust for new optimizationsDavid Bolvansky2019-05-251-3/+13
| | | | llvm-svn: 361697
* [NFC] Update test checksDavid Bolvansky2019-05-251-79/+439
| | | | llvm-svn: 361695
* [LoopVectorize] update test to be independent of instcombine; NFCSanjay Patel2019-05-241-13/+13
| | | | | | | | This is a regression test for vectorization, so remove instcombine from the RUN line and adjust the comparison predicates to show what the vectorizer is creating rather than how instcombine cleans it up. llvm-svn: 361648
* [IR] allow fast-math-flags on select of FP valuesSanjay Patel2019-05-222-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | This is a minimal start to correcting a problem most directly discussed in PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 We have been hacking around a limitation for FP select patterns by using the fast-math-flags on the condition of the select rather than the select itself. This patch just allows FMF to appear with the 'select' opcode. No changes are needed to "FPMathOperator" because it already includes select-of-FP because that definition is based on the (return) value type. Once we have this ability, we can start correcting and adding IR transforms to use the FMF on a 'select' instruction. The instcombine and vectorizer test diffs only show that the IRBuilder change is behaving as expected by applying an FMF guard value to 'select'. For reference: rL241901 - allowed FMF with fcmp rL255555 - allowed FMF with FP calls Differential Revision: https://reviews.llvm.org/D61917 llvm-svn: 361401
* [LoopVectorizer] add tests for FP minmax; NFCSanjay Patel2019-05-121-0/+161
| | | | llvm-svn: 360542
OpenPOWER on IntegriCloud