summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
* LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming ↵Nadav Rotem2013-05-031-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | values. By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements. We can now vectorize this loop: int foo(int *A, int *B, int n) { for (int i=0; i < n; i++) { int x = 9; if (A[i] > B[i]) { if (A[i] > 19) { x = 3; } else if (B[i] < 4 ) { x = 4; } else { x = 5; } } A[i] = x; } } llvm-svn: 181037
* TBAA: remove !tbaa from testing cases if not used.Manman Ren2013-05-023-98/+84
| | | | | | | This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180935
* TBAA: remove !tbaa from testing cases if not used.Manman Ren2013-04-3013-90/+44
| | | | | | | This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180796
* LoopVectorizer: Calculate the number of pointers to disambiguate at runtime ↵Nadav Rotem2013-04-261-0/+84
| | | | | | based on the numbers of reads and writes. llvm-svn: 180593
* LoopVectorizer: No need to generate pointer disambiguation checks between ↵Nadav Rotem2013-04-251-0/+36
| | | | | | readonly pointers. llvm-svn: 180570
* LoopVectorize: Scalarize padded typesArnold Schwaighofer2013-04-241-0/+29
| | | | | | | | | | | | | | | | | | This patch disables memory-instruction vectorization for types that need padding bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on x86_64. Because the load/store vectorization is performed by the bit casting to a packed vector, which has incompatible memory layout due to the lack of padding bytes, the present vectorizer produces inconsistent result for memory instructions of those types. This patch checks an equality of the AllocSize of a scalar type and allocated size for each vector element, to ensure that there is no padding bytes and the array can be read/written using vector operations. Patch by Daisuke Takahashi! Fixes PR15758. llvm-svn: 180196
* LoopVectorizer: Bail out if we don't have datalayout we need itArnold Schwaighofer2013-04-243-1/+8
| | | | llvm-svn: 180195
* LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure ↵Nadav Rotem2013-04-231-0/+36
| | | | | | | | that the order in which the elements are scalarized is the same as the original order. This fixes a miscompilation in FreeBSD's regex library. llvm-svn: 180121
* Call the potentially costly isAnnotatedParallel() only once. Pekka Jaaskelainen2013-04-231-1/+2
| | | | | | Made the uniform write test's checks a bit stricter. llvm-svn: 180119
* Refuse to (even try to) vectorize loops which have uniform writes,Pekka Jaaskelainen2013-04-231-0/+58
| | | | | | | | | even if erroneously annotated with the parallel loop metadata. Fixes Bug 15794: "Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata" llvm-svn: 180081
* LoopVectorizer: Recognize min/max reductionsArnold Schwaighofer2013-04-181-0/+399
| | | | | | | | | | | | A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y) sequence in LLVM. If we see such a sequence we can treat it just as any other commutative binary instruction and reduce it. This appears to help bzip2 by about 1.5% on an imac12,2. radar://12960601 llvm-svn: 179773
* LoopVectorize: Use a set to avoid longer cycles in the reduction chain too.Benjamin Kramer2013-04-181-0/+18
| | | | | | Fixes PR15748. llvm-svn: 179757
* LoopVectorizer: integer division is not a reduction operationArnold Schwaighofer2013-04-121-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't classify idiv/udiv as a reduction operation. Integer division is lossy. For example : (1 / 2) * 4 != 4/2. Example: int a[] = { 2, 5, 2, 2} int x = 80; for() x /= a[i]; Scalar: x /= 2 // = 40 x /= 5 // = 8 x /= 2 // = 4 x /= 2 // = 2 Vectorized: <80, 1> / <2,5> //= <40,0> <40, 0> / <2,2> //= <20,0> 20*0 = 0 radar://13640654 llvm-svn: 179381
* LoopVectorizer: Pass OperandValueKind information to the cost modelArnold Schwaighofer2013-04-041-0/+28
| | | | | | | | | | | | Pass down the fact that an operand is going to be a vector of constants. This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86 back. It had degraded to scalar performance due to my pervious shift cost change that made all shifts expensive on x86. radar://13576547 llvm-svn: 178809
* X86TTI: Add accurate costs for itofp operations, based on the actual ↵Benjamin Kramer2013-04-011-5/+4
| | | | | | instruction counts. llvm-svn: 178459
* LoopVectorizer: Insert some white space to make test case more readableArnold Schwaighofer2013-03-141-6/+10
| | | | | | Also remove some unneeded function attributes. llvm-svn: 177114
* Add missing asserts flag to test - it uses debug flagsArnold Schwaighofer2013-03-141-1/+1
| | | | llvm-svn: 177102
* LoopVectorize: Invert case when we use a vector cmp value to query select costArnold Schwaighofer2013-03-141-0/+62
| | | | | | | We generate a select with a vectorized condition argument when the condition is NOT loop invariant. Not the other way around. llvm-svn: 177098
* Test case hygiene.Benjamin Kramer2013-03-091-20/+20
| | | | llvm-svn: 176772
* LoopVectorizer: Ignore dbg.value instructionsArnold Schwaighofer2013-03-091-0/+70
| | | | | | | | | We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic and don't transfer them to the vectorized code. radar://13378964 llvm-svn: 176768
* Force cpu in test.Benjamin Kramer2013-03-081-1/+1
| | | | llvm-svn: 176702
* Insert the reduction start value into the first bypass block to preserve ↵Benjamin Kramer2013-03-081-0/+35
| | | | | | | | domination. Fixes PR15344. llvm-svn: 176701
* X86 cost model: Adjust cost for custom lowered vector multipliesArnold Schwaighofer2013-03-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This matters for example in following matrix multiply: int **mmult(int rows, int cols, int **m1, int **m2, int **m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 llvm-svn: 176403
* PR14448 - prevent the loop vectorizer from vectorizing the same loop twice.Nadav Rotem2013-03-021-0/+75
| | | | | | | | | | The LoopVectorizer often runs multiple times on the same function due to inlining. When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches. With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time. PR14448. llvm-svn: 176399
* LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses.Benjamin Kramer2013-03-011-0/+29
| | | | | | Fixes PR15384. llvm-svn: 176366
* LoopVectorize: Vectorize math builtin calls.Benjamin Kramer2013-02-271-0/+24
| | | | | | | | | | | This properly asks TargetLibraryInfo if a call is available and if it is, it can be translated into the corresponding LLVM builtin. We don't vectorize sqrt() yet because I'm not sure about the semantics for negative numbers. The other intrinsic should be exact equivalents to the libm functions. Differential Revision: http://llvm-reviews.chandlerc.com/D465 llvm-svn: 176188
* Some more tests for the global structure vectorizerRenato Golin2013-02-231-20/+596
| | | | llvm-svn: 175964
* More tests to global struct vectorizerRenato Golin2013-02-221-0/+146
| | | | llvm-svn: 175898
* Allow GlobalValues to vectorize with AliasAnalysisRenato Golin2013-02-211-0/+356
| | | | | | | | | | | | | | | | | | | | | Storing the load/store instructions with the values and inspect them using Alias Analysis to make sure they don't alias, since the GEP pointer operand doesn't take the offset into account. Trying hard to not add any extra cost to loads and stores that don't overlap on global values, AA is *only* calculated if all of the previous attempts failed. Using biggest vector register size as the stride for the vectorization access, as we're being conservative and the cost model (which calculates the real vectorization factor) is only run after the legalization phase. We might re-think this relationship in the future, but for now, I'd rather be safe than sorry. llvm-svn: 175818
* Forgot to 'svn add' the LoopVectorizer tests for the new parallel loop ↵Pekka Jaaskelainen2013-02-152-0/+166
| | | | | | metadata, sorry. llvm-svn: 175311
* Formatting.NAKAMURA Takumi2013-02-051-1/+1
| | | | llvm-svn: 174380
* llvm/test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll: "-debug" ↵NAKAMURA Takumi2013-02-051-0/+1
| | | | | | requires +Asserts. llvm-svn: 174379
* Loop Vectorizer: Handle pointer stores/loads in getWidestType()Arnold Schwaighofer2013-02-051-0/+149
| | | | | | | | | | | | | | | | | In the loop vectorizer cost model, we used to ignore stores/loads of a pointer type when computing the widest type within a loop. This meant that if we had only stores/loads of pointers in a loop we would return a widest type of 8bits (instead of 32 or 64 bit) and therefore a vector factor that was too big. Now, if we see a consecutive store/load of pointers we use the size of a pointer (from data layout). This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced test case is the first test in vector_ptr_load_store.ll). radar://13139343 llvm-svn: 174377
* Made the min-trip-count-switch test X86-specific to avoidPekka Jaaskelainen2013-01-311-0/+0
| | | | | | breakage with builds without X86-support. llvm-svn: 174052
* Adding simple cast cost to ARMRenato Golin2013-01-291-0/+114
| | | | | | | | | | | Changing ARMBaseTargetMachine to return ARMTargetLowering intead of the generic one (similar to x86 code). Tests showing which instructions were added to cast when necessary or cost zero when not. Downcast to 16 bits are not lowered in NEON, so costs are not there yet. llvm-svn: 173849
* LoopVectorize: convert TinyTripCountVectorThreshold constantPekka Jaaskelainen2013-01-291-0/+28
| | | | | | to a command line switch. llvm-svn: 173837
* Add support for reverse pointer induction variables. These are loops that ↵Nadav Rotem2013-01-233-0/+195
| | | | | | | | | | | | | contain pointers that count backwards. For example, this is the hot loop in BZIP: do { m = *--p; *p = ( ... ); } while (--n); llvm-svn: 173219
* LoopVectorizer: Implement a new heuristics for selecting the unroll factor.Nadav Rotem2013-01-201-0/+71
| | | | | | | We ignore the cpu frontend and focus on pipeline utilization. We do this because we don't have a good way to estimate the loop body size at the IR level. llvm-svn: 172964
* Change the cpu type in the test.Nadav Rotem2013-01-201-1/+1
| | | | llvm-svn: 172963
* LoopVectorizer: Emit memory checks into their own basic block.Benjamin Kramer2013-01-191-0/+4
| | | | | | | | | | | | | | This separates the check for "too few elements to run the vector loop" from the "memory overlap" check, giving a lot nicer code and allowing to skip the memory checks when we're not going to execute the vector code anyways. We still leave the decision of whether to emit the memory checks as branches or setccs, but it seems to be doing a good job. If ugly code pops up we may want to emit them as separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso. Most of this is legwork to allow multiple bypass blocks while updating PHIs, dominators and loop info. llvm-svn: 172902
* Move test that depends on the x86 target into a target-specific directory.Benjamin Kramer2013-01-161-0/+0
| | | | | | Should fix the arm buildbot (which only builds the arm target). llvm-svn: 172611
* Fix PR14547. Handle induction variables of small sizes smaller than i32 (i8 ↵Nadav Rotem2013-01-131-0/+35
| | | | | | and i16). llvm-svn: 172348
* ARM Cost Model: Modify the target independent cost model to askNadav Rotem2013-01-111-3/+3
| | | | | | | | the target if it supports the different CAST types. We didn't do this on X86 because of the different register sizes and types, but on ARM this makes sense. llvm-svn: 172245
* ARM Cost Model: We need to detect the max bitwidth of types in the loop in ↵Nadav Rotem2013-01-111-0/+52
| | | | | | | | | | | order to select the max vectorization factor. We don't have a detailed analysis on which values are vectorized and which stay scalars in the vectorized loop so we use another method. We look at reduction variables, loads and stores, which are the only ways to get information in and out of loop iterations. If the data types are extended and truncated then the cost model will catch the cost of the vector zext/sext/trunc operations. llvm-svn: 172178
* LoopVectorizer: Fix a bug in the vectorization of BinaryOperators. The ↵Nadav Rotem2013-01-101-0/+25
| | | | | | | | BinaryOperator can be folded to an Undef, and we don't want to set NSW flags to undef vals. PR14878 llvm-svn: 172079
* ARM Cost model: Use the size of vector registers and widest vectorizable ↵Nadav Rotem2013-01-093-2/+62
| | | | | | instruction to determine the max vectorization factor. llvm-svn: 172010
* ARM Cost Model: Add a basic vectorization unrolling test.Nadav Rotem2013-01-091-3/+10
| | | | llvm-svn: 171931
* Remove the -licm pass from the loop vectorizer test because the loop ↵Nadav Rotem2013-01-0923-25/+25
| | | | | | vectorizer does it now. llvm-svn: 171930
* Cost Model: Move the 'max unroll factor' variable to the TTI and add initial ↵Nadav Rotem2013-01-093-2/+31
| | | | | | Cost Model support on ARM. llvm-svn: 171928
* LoopVectorizer: Add support for floating point reductionsNadav Rotem2013-01-071-0/+29
| | | | llvm-svn: 171812
OpenPOWER on IntegriCloud