summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* Teach LoopVectorize about address space sizesMatt Arsenault2013-08-221-2/+29
| | | | llvm-svn: 188980
* Add a llvm.copysign intrinsicHal Finkel2013-08-191-0/+53
| | | | | | | | | | | | | | | | | | | | | This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728
* [tests] Cleanup initialization of test suffixes.Daniel Dunbar2013-08-163-5/+0
| | | | | | | | | | | | | | | | | - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513
* Add ISD::FROUND for libm round()Hal Finkel2013-08-071-0/+52
| | | | | | | | | | | | | | | All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well. For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround). This will be used by the PowerPC backend in a follow-up commit. llvm-svn: 187926
* LoopVectorize: Allow vectorization of loops with lifetime markersArnold Schwaighofer2013-08-061-0/+96
| | | | | | Patch by Marc Jessome! llvm-svn: 187825
* Use function attributes to indicate that we don't want to realign the stack.Bill Wendling2013-08-011-1/+1
| | | | | | | | Function attributes are the future! So just query whether we want to realign the stack directly from the function instead of through a random target options structure. llvm-svn: 187618
* Debug Info: update testing cases to pass verifier.Manman Ren2013-07-291-7/+8
| | | | llvm-svn: 187362
* Next batch of -disable-debug-info-verifier.Rafael Espindola2013-07-261-1/+1
| | | | | | These tests fail without it if pipefail is enabled. llvm-svn: 187205
* Catch more CHECK that can be converted to CHECK-LABEL in Transforms for ↵Stephen Lin2013-07-141-20/+20
| | | | | | | | | | | | | | | | | | | | | | easier debugging. No functionality change. This conversion was done with the following bash script: find test/Transforms -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)define\([^@]*\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3define\4@$FUNC(/g" $TEMP done mv $TEMP $NAME fi done llvm-svn: 186269
* Update Transforms tests to use CHECK-LABEL for easier debugging. No ↵Stephen Lin2013-07-1446-154/+154
| | | | | | | | | | | | | | | | | | | | | | functionality change. This update was done with the following bash script: find test/Transforms -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP done mv $TEMP $NAME fi done llvm-svn: 186268
* LoopVectorizer: Disallow reductions whose header phi is used outside the loopArnold Schwaighofer2013-07-131-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an outside loop user of the reduction value uses the header phi node we cannot just reduce the vectorized phi value in the vector code epilog because we would loose VF-1 reductions. lp: p = phi (0, lv) lv = lv + 1 ... brcond , lp, outside outside: usr = add 0, p (Say the loop iterates two times, the value of p coming out of the loop is one). We cannot just transform this to: vlp: p = phi (<0,0>, lv) lv = lv + <1,1> .. brcond , lp, outside outside: p_reduced = p[0] + [1]; usr = add 0, p_reduced (Because the original loop iterated two times the vectorized loop would iterate one time, but p_reduced ends up being zero instead of one). We would have to execute VF-1 iterations in the scalar remainder loop in such cases. For now, just disable vectorization. PR16522 llvm-svn: 186256
* Make the new vectorizer test immune to TTIAndrew Trick2013-07-131-1/+1
| | | | llvm-svn: 186242
* LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander.Andrew Trick2013-07-131-0/+113
| | | | | | | | | | | In general, one should always complete CFG modifications first, update CFG-based analyses, like Dominatores and LoopInfo, then generate instruction sequences. LoopVectorizer was creating a new loop, calling SCEVExpander to generate checks, then updating LoopInfo. I just changed the order. llvm-svn: 186241
* X86 cost model: Add cost for vectorized gather/scatherArnold Schwaighofer2013-07-121-0/+86
| | | | | | radar://14351991 llvm-svn: 186189
* ARM cost model: Add cost for gather/scatherArnold Schwaighofer2013-07-121-0/+88
| | | | | | | | | | Fixes a 35% degradation compared to unvectorized code in MiBench/automotive-susan and an equally serious regression on a private image processing benchmark. radar://14351991 llvm-svn: 186188
* LoopVectorize: Vectorize all accesses in address space zero with unit strideArnold Schwaighofer2013-07-111-0/+61
| | | | | | | | | | | We can vectorize them because in the case where we wrap in the address space the unvectorized code would have had to access a pointer value of zero which is undefined behavior in address space zero according to the LLVM IR semantics. (Thank you Duncan, for pointing this out to me). Fixes PR16592. llvm-svn: 186088
* LoopVectorize: Math functions only read rounding modeArnold Schwaighofer2013-07-011-0/+32
| | | | | | | | Math functions are mark as readonly because they read the floating point rounding mode. Because we don't vectorize loops that would contain function calls that set the rounding mode it is safe to ignore this memory read. llvm-svn: 185299
* LoopVectorize: Preserve debug location infoArnold Schwaighofer2013-06-281-0/+92
| | | | | | radar://14169017 llvm-svn: 185122
* LoopVectorize: Cache edge masks created during if-conversionArnold Schwaighofer2013-06-271-0/+243
| | | | | | | Otherwise, we end up with an exponential IR blowup. Fixes PR16472. llvm-svn: 185097
* LoopVectorize: Use vectorized loop invariant gep index anchored in loopArnold Schwaighofer2013-06-271-0/+36
| | | | | | | | | Use vectorized instruction instead of original instruction anchored in the original loop. Fixes PR16452 and t2075.c of PR16455. llvm-svn: 185081
* Fix spelling.Arnold Schwaighofer2013-06-271-1/+1
| | | | llvm-svn: 185052
* LoopVectorize: Don't store a reversed value in the vectorized value mapArnold Schwaighofer2013-06-271-0/+55
| | | | | | | | | | When we store values for reversed induction stores we must not store the reversed value in the vectorized value map. Another instruction might use this value. This fixes 3 test cases of PR16455. llvm-svn: 185051
* Reapply 184685 after the SetVector iteration order fix.Arnold Schwaighofer2013-06-243-2/+224
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg testers. "LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598" llvm-svn: 184724
* Revert "LoopVectorize: Use the dependence test utility class"Arnold Schwaighofer2013-06-243-224/+2
| | | | | | | | This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac. We are seeing a stage2 and stage3 miscompare on some dragonegg bots. llvm-svn: 184690
* LoopVectorize: Use the dependence test utility classArnold Schwaighofer2013-06-243-2/+224
| | | | | | | | | | | | | | | | | | | | | | | | | | | We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598 llvm-svn: 184685
* Fix for a regression caused by the LoopVectorizer whenPekka Jaaskelainen2013-06-171-0/+47
| | | | | | | vectorizing loops with memory accesses to non-zero address spaces. It simply dropped the AS info. Fixes PR16306. llvm-svn: 184103
* LoopVectorize: PHIs with only outside users should prevent vectorizationArnold Schwaighofer2013-05-311-0/+41
| | | | | | | | | | We check that instructions in the loop don't have outside users (except if they are reduction values). Unfortunately, we skipped this check for if-convertable PHIs. Fixes PR16184. llvm-svn: 183035
* Add support for llvm.vectorizer metadataPaul Redmond2013-05-286-10/+86
| | | | | | | | | | | | | | | | | | | - llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic by making the root of additional loop metadata. - Loop::isAnnotatedParallel now looks for llvm.loop and associated llvm.mem.parallel_loop_access - document llvm.loop and update llvm.mem.parallel_loop_access - add support for llvm.vectorizer.width and llvm.vectorizer.unroll - document llvm.vectorizer.* metadata - add utility class LoopVectorizerHints for getting/setting loop metadata - use llvm.vectorizer.width=1 to indicate already vectorized instead of already_vectorized - update existing tests that used llvm.loop.parallel and llvm.vectorizer.already_vectorized Reviewed by: Nadav Rotem llvm-svn: 182802
* LoopVectorize: LoopSimplify can't canonicalize loops with an indirectbr in ↵Benjamin Kramer2013-05-241-0/+11
| | | | | | | | it, don't assert on those cases. Fixes PR16139. llvm-svn: 182656
* LoopVectorize: Make Value pointers that could be RAUW'ed a VHArnold Schwaighofer2013-05-221-0/+50
| | | | | | | | | | The Value pointers we store in the induction variable list can be RAUW'ed by a call to SCEVExpander::expandCodeFor, use a TrackingVH instead. Do the same thing in some other places where we store pointers that could potentially be RAUW'ed. Fixes PR16073. llvm-svn: 182485
* LoopVectorize: Handle single edge PHIsArnold Schwaighofer2013-05-181-0/+22
| | | | | | | | We might encouter single edge PHIs - handle them with an identity select. Fixes PR15990. llvm-svn: 182199
* LoopVectorize: Hoist conditional loads if possibleArnold Schwaighofer2013-05-151-0/+69
| | | | | | | | | | | | InstCombine can be uncooperative to vectorization and sink loads into conditional blocks. This prevents vectorization. Undo this optimization if there are unconditional memory accesses to the same addresses in the loop. radar://13815763 llvm-svn: 181860
* LoopVectorize: Handle loops with multiple forward inductionsArnold Schwaighofer2013-05-141-0/+30
| | | | | | | | | | | | We used to give up if we saw two integer inductions. After this patch, we base further induction variables on the chosen one like we do in the reverse induction and pointer induction case. Fixes PR15720. radar://13851975 llvm-svn: 181746
* LoopVectorize: Use the widest induction variable typeArnold Schwaighofer2013-05-111-0/+69
| | | | | | | | | | | | | | | | | | | | | | Use the widest induction type encountered for the cannonical induction variable. We used to turn the following loop into an empty loop because we used i8 as induction variable type and truncated 1024 to 0 as trip count. int a[1024]; void fail() { int reverse_induction = 1023; unsigned char forward_induction = 0; while ((reverse_induction) >= 0) { forward_induction++; a[reverse_induction] = forward_induction; --reverse_induction; } } radar://13862901 llvm-svn: 181667
* Add an additional testcase for PR15882.Nadav Rotem2013-05-101-0/+45
| | | | llvm-svn: 181646
* LoopVectorizer: Don't assert on the absence of induction variablesArnold Schwaighofer2013-05-091-0/+34
| | | | | | | | | A computable loop exit count does not imply the presence of an induction variable. Scalar evolution can return a value for an infinite loop. Fixes PR15926. llvm-svn: 181495
* LoopVectorizer: Improve reduction variable identificationArnold Schwaighofer2013-05-071-0/+119
| | | | | | | The two nested loops were confusing and also conservative in identifying reduction variables. This patch replaces them by a worklist based approach. llvm-svn: 181369
* LoopVectorize: getConsecutiveVector must respect signed arithmeticArnold Schwaighofer2013-05-071-0/+79
| | | | | | | | | | We were passing an i32 to ConstantInt::get where an i64 was needed and we must also pass the sign if we pass negatives numbers. The start index passed to getConsecutiveVector must also be signed. Should fix PR15882. llvm-svn: 181286
* LoopVectorize: Add support for floating point min/max reductionsArnold Schwaighofer2013-05-051-0/+480
| | | | | | | | | | Add support for min/max reductions when "no-nans-float-math" is enabled. This allows us to assume we have ordered floating point math and treat ordered and unordered predicates equally. radar://13723044 llvm-svn: 181144
* LoopVectorize: We don't need an identity element for min/max reductionsArnold Schwaighofer2013-05-051-1/+4
| | | | | | | | | | We can just use the initial element that feeds the reduction. max(max(x, y), z) == max(max(x,y), max(x,z)) radar://13723044 llvm-svn: 181141
* LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming ↵Nadav Rotem2013-05-031-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | values. By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements. We can now vectorize this loop: int foo(int *A, int *B, int n) { for (int i=0; i < n; i++) { int x = 9; if (A[i] > B[i]) { if (A[i] > 19) { x = 3; } else if (B[i] < 4 ) { x = 4; } else { x = 5; } } A[i] = x; } } llvm-svn: 181037
* TBAA: remove !tbaa from testing cases if not used.Manman Ren2013-05-023-98/+84
| | | | | | | This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180935
* TBAA: remove !tbaa from testing cases if not used.Manman Ren2013-04-3013-90/+44
| | | | | | | This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180796
* LoopVectorizer: Calculate the number of pointers to disambiguate at runtime ↵Nadav Rotem2013-04-261-0/+84
| | | | | | based on the numbers of reads and writes. llvm-svn: 180593
* LoopVectorizer: No need to generate pointer disambiguation checks between ↵Nadav Rotem2013-04-251-0/+36
| | | | | | readonly pointers. llvm-svn: 180570
* LoopVectorize: Scalarize padded typesArnold Schwaighofer2013-04-241-0/+29
| | | | | | | | | | | | | | | | | | This patch disables memory-instruction vectorization for types that need padding bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on x86_64. Because the load/store vectorization is performed by the bit casting to a packed vector, which has incompatible memory layout due to the lack of padding bytes, the present vectorizer produces inconsistent result for memory instructions of those types. This patch checks an equality of the AllocSize of a scalar type and allocated size for each vector element, to ensure that there is no padding bytes and the array can be read/written using vector operations. Patch by Daisuke Takahashi! Fixes PR15758. llvm-svn: 180196
* LoopVectorizer: Bail out if we don't have datalayout we need itArnold Schwaighofer2013-04-243-1/+8
| | | | llvm-svn: 180195
* LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure ↵Nadav Rotem2013-04-231-0/+36
| | | | | | | | that the order in which the elements are scalarized is the same as the original order. This fixes a miscompilation in FreeBSD's regex library. llvm-svn: 180121
* Call the potentially costly isAnnotatedParallel() only once. Pekka Jaaskelainen2013-04-231-1/+2
| | | | | | Made the uniform write test's checks a bit stricter. llvm-svn: 180119
* Refuse to (even try to) vectorize loops which have uniform writes,Pekka Jaaskelainen2013-04-231-0/+58
| | | | | | | | | even if erroneously annotated with the parallel loop metadata. Fixes Bug 15794: "Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata" llvm-svn: 180081
OpenPOWER on IntegriCloud