summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
* Some more tests for the global structure vectorizerRenato Golin2013-02-231-20/+596
| | | | llvm-svn: 175964
* More tests to global struct vectorizerRenato Golin2013-02-221-0/+146
| | | | llvm-svn: 175898
* Allow GlobalValues to vectorize with AliasAnalysisRenato Golin2013-02-211-0/+356
| | | | | | | | | | | | | | | | | | | | | Storing the load/store instructions with the values and inspect them using Alias Analysis to make sure they don't alias, since the GEP pointer operand doesn't take the offset into account. Trying hard to not add any extra cost to loads and stores that don't overlap on global values, AA is *only* calculated if all of the previous attempts failed. Using biggest vector register size as the stride for the vectorization access, as we're being conservative and the cost model (which calculates the real vectorization factor) is only run after the legalization phase. We might re-think this relationship in the future, but for now, I'd rather be safe than sorry. llvm-svn: 175818
* Forgot to 'svn add' the LoopVectorizer tests for the new parallel loop ↵Pekka Jaaskelainen2013-02-152-0/+166
| | | | | | metadata, sorry. llvm-svn: 175311
* Formatting.NAKAMURA Takumi2013-02-051-1/+1
| | | | llvm-svn: 174380
* llvm/test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll: "-debug" ↵NAKAMURA Takumi2013-02-051-0/+1
| | | | | | requires +Asserts. llvm-svn: 174379
* Loop Vectorizer: Handle pointer stores/loads in getWidestType()Arnold Schwaighofer2013-02-051-0/+149
| | | | | | | | | | | | | | | | | In the loop vectorizer cost model, we used to ignore stores/loads of a pointer type when computing the widest type within a loop. This meant that if we had only stores/loads of pointers in a loop we would return a widest type of 8bits (instead of 32 or 64 bit) and therefore a vector factor that was too big. Now, if we see a consecutive store/load of pointers we use the size of a pointer (from data layout). This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced test case is the first test in vector_ptr_load_store.ll). radar://13139343 llvm-svn: 174377
* Made the min-trip-count-switch test X86-specific to avoidPekka Jaaskelainen2013-01-311-0/+0
| | | | | | breakage with builds without X86-support. llvm-svn: 174052
* Adding simple cast cost to ARMRenato Golin2013-01-291-0/+114
| | | | | | | | | | | Changing ARMBaseTargetMachine to return ARMTargetLowering intead of the generic one (similar to x86 code). Tests showing which instructions were added to cast when necessary or cost zero when not. Downcast to 16 bits are not lowered in NEON, so costs are not there yet. llvm-svn: 173849
* LoopVectorize: convert TinyTripCountVectorThreshold constantPekka Jaaskelainen2013-01-291-0/+28
| | | | | | to a command line switch. llvm-svn: 173837
* Add support for reverse pointer induction variables. These are loops that ↵Nadav Rotem2013-01-233-0/+195
| | | | | | | | | | | | | contain pointers that count backwards. For example, this is the hot loop in BZIP: do { m = *--p; *p = ( ... ); } while (--n); llvm-svn: 173219
* LoopVectorizer: Implement a new heuristics for selecting the unroll factor.Nadav Rotem2013-01-201-0/+71
| | | | | | | We ignore the cpu frontend and focus on pipeline utilization. We do this because we don't have a good way to estimate the loop body size at the IR level. llvm-svn: 172964
* Change the cpu type in the test.Nadav Rotem2013-01-201-1/+1
| | | | llvm-svn: 172963
* LoopVectorizer: Emit memory checks into their own basic block.Benjamin Kramer2013-01-191-0/+4
| | | | | | | | | | | | | | This separates the check for "too few elements to run the vector loop" from the "memory overlap" check, giving a lot nicer code and allowing to skip the memory checks when we're not going to execute the vector code anyways. We still leave the decision of whether to emit the memory checks as branches or setccs, but it seems to be doing a good job. If ugly code pops up we may want to emit them as separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso. Most of this is legwork to allow multiple bypass blocks while updating PHIs, dominators and loop info. llvm-svn: 172902
* Move test that depends on the x86 target into a target-specific directory.Benjamin Kramer2013-01-161-0/+0
| | | | | | Should fix the arm buildbot (which only builds the arm target). llvm-svn: 172611
* Fix PR14547. Handle induction variables of small sizes smaller than i32 (i8 ↵Nadav Rotem2013-01-131-0/+35
| | | | | | and i16). llvm-svn: 172348
* ARM Cost Model: Modify the target independent cost model to askNadav Rotem2013-01-111-3/+3
| | | | | | | | the target if it supports the different CAST types. We didn't do this on X86 because of the different register sizes and types, but on ARM this makes sense. llvm-svn: 172245
* ARM Cost Model: We need to detect the max bitwidth of types in the loop in ↵Nadav Rotem2013-01-111-0/+52
| | | | | | | | | | | order to select the max vectorization factor. We don't have a detailed analysis on which values are vectorized and which stay scalars in the vectorized loop so we use another method. We look at reduction variables, loads and stores, which are the only ways to get information in and out of loop iterations. If the data types are extended and truncated then the cost model will catch the cost of the vector zext/sext/trunc operations. llvm-svn: 172178
* LoopVectorizer: Fix a bug in the vectorization of BinaryOperators. The ↵Nadav Rotem2013-01-101-0/+25
| | | | | | | | BinaryOperator can be folded to an Undef, and we don't want to set NSW flags to undef vals. PR14878 llvm-svn: 172079
* ARM Cost model: Use the size of vector registers and widest vectorizable ↵Nadav Rotem2013-01-093-2/+62
| | | | | | instruction to determine the max vectorization factor. llvm-svn: 172010
* ARM Cost Model: Add a basic vectorization unrolling test.Nadav Rotem2013-01-091-3/+10
| | | | llvm-svn: 171931
* Remove the -licm pass from the loop vectorizer test because the loop ↵Nadav Rotem2013-01-0923-25/+25
| | | | | | vectorizer does it now. llvm-svn: 171930
* Cost Model: Move the 'max unroll factor' variable to the TTI and add initial ↵Nadav Rotem2013-01-093-2/+31
| | | | | | Cost Model support on ARM. llvm-svn: 171928
* LoopVectorizer: Add support for floating point reductionsNadav Rotem2013-01-071-0/+29
| | | | llvm-svn: 171812
* LoopVectorizer: When we vectorizer and widen loops we process many elements ↵Nadav Rotem2013-01-071-0/+50
| | | | | | | | | at once. This is a good thing, except for small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the rest of the loop. This patch disables widening of loops with a small static trip count. llvm-svn: 171798
* Fix a typo. Remove the duplicated test.Nadav Rotem2013-01-051-25/+0
| | | | llvm-svn: 171584
* iLoopVectorize: Non commutative operators can be used as reduction variables ↵Nadav Rotem2013-01-052-3/+31
| | | | | | | | as long as the reduction chain is used in the LHS. PR14803. llvm-svn: 171583
* Force a fixed unroll count on the target independent tests.Nadav Rotem2013-01-0527-27/+27
| | | | | | This should fix clang-native-arm-cortex-a9. Thanks Renato. llvm-svn: 171582
* Do not vectorize loops with subtraction reductionsPaul Redmond2013-01-042-1/+51
| | | | | | | | | Since subtraction does not commute the loop vectorizer incorrectly vectorizes reductions such as x = A[i] - x. Disabling for now. llvm-svn: 171537
* LoopVectorizer:Nadav Rotem2013-01-042-2/+58
| | | | | | | | 1. Add code to estimate register pressure. 2. Add code to select the unroll factor based on register pressure. 3. Add bits to TargetTransformInfo to provide the number of registers. llvm-svn: 171469
* LoopVectorizer: Test the unrolling flag.Nadav Rotem2013-01-031-0/+39
| | | | llvm-svn: 171446
* Avoid vectorization when the function has the "noimplicitflot" attribute.Nadav Rotem2013-01-021-0/+29
| | | | llvm-svn: 171429
* LoopVectorizer: Fix a bug in the code that updates the loop exiting block.Nadav Rotem2012-12-301-0/+29
| | | | | | | | | LCSSA PHIs may have undef values. The vectorizer updates values that are used by outside users such as PHIs. The bug happened because undefs are not loop values. This patch handles these PHIs. PR14725 llvm-svn: 171251
* If all of the write objects are identified then we can vectorize the loop ↵Nadav Rotem2012-12-261-0/+53
| | | | | | | | even if the read objects are unidentified. PR14719. llvm-svn: 171124
* LoopVectorizer: Optimize the vectorization of consecutive memory access when ↵Nadav Rotem2012-12-261-1/+2
| | | | | | the iteration step is -1 llvm-svn: 171114
* LoopVectorize: Enable vectorization of the fmuladd intrinsicHal Finkel2012-12-251-0/+60
| | | | llvm-svn: 171076
* Fix typo "Makre" -> "Make".Nick Lewycky2012-12-241-6/+4
| | | | llvm-svn: 171043
* LoopVectorizer: When checking for vectorizable types, also checkNadav Rotem2012-12-241-0/+29
| | | | | | | | the StoreInst operands. PR14705. llvm-svn: 171023
* LoopVectorizer: Fix an endless loop in the code that looks for reductions.Nadav Rotem2012-12-241-0/+44
| | | | | | | | The bug was in the code that detects PHIs in if-then-else block sequence. PR14701. llvm-svn: 171008
* CostModel: Change the default target-independent implementation for findingNadav Rotem2012-12-231-3/+3
| | | | | | | | the cost of arithmetic functions. We now assume that the cost of arithmetic operations that are marked as Legal or Promote is low, but ops that are marked as custom are higher. llvm-svn: 171002
* Loop Vectorizer: Update the cost model of scatter/gather operations and makeNadav Rotem2012-12-231-1/+4
| | | | | | them more expensive. llvm-svn: 170995
* Fix a bug in the code that checks if we can vectorize loops while using dynamicNadav Rotem2012-12-212-48/+110
| | | | | | | | | | memory bound checks. Before the fix we were able to vectorize this loop from the Livermore Loops benchmark: for ( k=1 ; k<n ; k++ ) x[k] = x[k-1] + y[k]; llvm-svn: 170811
* LoopVectorize: Fix a bug in the scalarization of instructions.Nadav Rotem2012-12-201-0/+48
| | | | | | | | | | Before if-conversion we could check if a value is loop invariant if it was declared inside the basic block. Now that loops have multiple blocks this check is incorrect. This fixes External/SPEC/CINT95/099_go/099_go llvm-svn: 170756
* Make TargetLowering::getTypeConversion more resilient against odd illegal MVTs.Benjamin Kramer2012-12-191-0/+22
| | | | | | | | | - An MVT can become an EVT when being split (e.g. v2i8 -> v1i8, the latter doesn't exist) - Return the scalar value when an MVT is scalarized (v1i64 -> i64) Fixes PR14639ff. llvm-svn: 170546
* LoopVectorize: Emit reductions as log2(vectorsize) shuffles + vector ops ↵Benjamin Kramer2012-12-181-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead of scalar operations. For example on x86 with SSE4.2 a <8 x i8> add reduction becomes movdqa %xmm0, %xmm1 movhlps %xmm1, %xmm1 ## xmm1 = xmm1[1,1] paddw %xmm0, %xmm1 pshufd $1, %xmm1, %xmm0 ## xmm0 = xmm1[1,0,0,0] paddw %xmm1, %xmm0 phaddw %xmm0, %xmm0 pextrb $0, %xmm0, %edx instead of pextrb $2, %xmm0, %esi pextrb $0, %xmm0, %edx addb %sil, %dl pextrb $4, %xmm0, %esi addb %dl, %sil pextrb $6, %xmm0, %edx addb %sil, %dl pextrb $8, %xmm0, %esi addb %dl, %sil pextrb $10, %xmm0, %edi pextrb $14, %xmm0, %edx addb %sil, %dil pextrb $12, %xmm0, %esi addb %dil, %sil addb %sil, %dl llvm-svn: 170439
* Teach the cost model about the optimization in r169904: Truncation of ↵Nadav Rotem2012-12-131-1/+1
| | | | | | induction variables costs the same as scalar trunc. llvm-svn: 170051
* LoopVectorizer: Use the "optsize" attribute to decide if we are allowed to ↵Nadav Rotem2012-12-121-0/+170
| | | | | | increase the function size. llvm-svn: 170004
* PR14574. Fix a bug in the code that calculates the mask the converted PHIs ↵Nadav Rotem2012-12-111-0/+44
| | | | | | in if-conversion. llvm-svn: 169916
* Loop Vectorize: optimize the vectorization of trunc(induction_var). The ↵Nadav Rotem2012-12-114-4/+33
| | | | | | truncation is now done on scalars. llvm-svn: 169904
* Fix PR14565. Don't if-convert loops that have switch statements in them.Nadav Rotem2012-12-111-0/+39
| | | | llvm-svn: 169813
OpenPOWER on IntegriCloud