summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* Disable unrolling in the loop vectorizer when disabled in the pass managerHal Finkel2013-08-281-0/+31
| | | | | | | | | | | | | | | | | When unrolling is disabled in the pass manager, the loop vectorizer should also not unroll loops. This will allow the -fno-unroll-loops option in Clang to behave as expected (even for vectorizable loops). The loop vectorizer's -force-vector-unroll option will (continue to) override the pass-manager setting (including -force-vector-unroll=0 to force use of the internal auto-selection logic). In order to test this, I added a flag to opt (-disable-loop-unrolling) to force disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also, this fixes a small bug in opt where the loop vectorizer was enabled only after the pass manager populated the queue of passes (the global_alias.ll test needed a slight update to the RUN line as a result of this fix). llvm-svn: 189499
* [tests] Cleanup initialization of test suffixes.Daniel Dunbar2013-08-161-2/+0
| | | | | | | | | | | | | | | | | - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513
* Update Transforms tests to use CHECK-LABEL for easier debugging. No ↵Stephen Lin2013-07-149-21/+21
| | | | | | | | | | | | | | | | | | | | | | functionality change. This update was done with the following bash script: find test/Transforms -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP done mv $TEMP $NAME fi done llvm-svn: 186268
* X86 cost model: Add cost for vectorized gather/scatherArnold Schwaighofer2013-07-121-0/+86
| | | | | | radar://14351991 llvm-svn: 186189
* Add support for llvm.vectorizer metadataPaul Redmond2013-05-283-5/+5
| | | | | | | | | | | | | | | | | | | - llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic by making the root of additional loop metadata. - Loop::isAnnotatedParallel now looks for llvm.loop and associated llvm.mem.parallel_loop_access - document llvm.loop and update llvm.mem.parallel_loop_access - add support for llvm.vectorizer.width and llvm.vectorizer.unroll - document llvm.vectorizer.* metadata - add utility class LoopVectorizerHints for getting/setting loop metadata - use llvm.vectorizer.width=1 to indicate already vectorized instead of already_vectorized - update existing tests that used llvm.loop.parallel and llvm.vectorizer.already_vectorized Reviewed by: Nadav Rotem llvm-svn: 182802
* TBAA: remove !tbaa from testing cases if not used.Manman Ren2013-04-304-39/+26
| | | | | | | This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180796
* LoopVectorize: Scalarize padded typesArnold Schwaighofer2013-04-241-0/+29
| | | | | | | | | | | | | | | | | | This patch disables memory-instruction vectorization for types that need padding bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on x86_64. Because the load/store vectorization is performed by the bit casting to a packed vector, which has incompatible memory layout due to the lack of padding bytes, the present vectorizer produces inconsistent result for memory instructions of those types. This patch checks an equality of the AllocSize of a scalar type and allocated size for each vector element, to ensure that there is no padding bytes and the array can be read/written using vector operations. Patch by Daisuke Takahashi! Fixes PR15758. llvm-svn: 180196
* LoopVectorizer: Bail out if we don't have datalayout we need itArnold Schwaighofer2013-04-241-0/+2
| | | | llvm-svn: 180195
* Call the potentially costly isAnnotatedParallel() only once. Pekka Jaaskelainen2013-04-231-1/+2
| | | | | | Made the uniform write test's checks a bit stricter. llvm-svn: 180119
* Refuse to (even try to) vectorize loops which have uniform writes,Pekka Jaaskelainen2013-04-231-0/+58
| | | | | | | | | even if erroneously annotated with the parallel loop metadata. Fixes Bug 15794: "Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata" llvm-svn: 180081
* LoopVectorizer: Pass OperandValueKind information to the cost modelArnold Schwaighofer2013-04-041-0/+28
| | | | | | | | | | | | Pass down the fact that an operand is going to be a vector of constants. This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86 back. It had degraded to scalar performance due to my pervious shift cost change that made all shifts expensive on x86. radar://13576547 llvm-svn: 178809
* X86TTI: Add accurate costs for itofp operations, based on the actual ↵Benjamin Kramer2013-04-011-5/+4
| | | | | | instruction counts. llvm-svn: 178459
* LoopVectorizer: Insert some white space to make test case more readableArnold Schwaighofer2013-03-141-6/+10
| | | | | | Also remove some unneeded function attributes. llvm-svn: 177114
* Add missing asserts flag to test - it uses debug flagsArnold Schwaighofer2013-03-141-1/+1
| | | | llvm-svn: 177102
* LoopVectorize: Invert case when we use a vector cmp value to query select costArnold Schwaighofer2013-03-141-0/+62
| | | | | | | We generate a select with a vectorized condition argument when the condition is NOT loop invariant. Not the other way around. llvm-svn: 177098
* Force cpu in test.Benjamin Kramer2013-03-081-1/+1
| | | | llvm-svn: 176702
* Insert the reduction start value into the first bypass block to preserve ↵Benjamin Kramer2013-03-081-0/+35
| | | | | | | | domination. Fixes PR15344. llvm-svn: 176701
* X86 cost model: Adjust cost for custom lowered vector multipliesArnold Schwaighofer2013-03-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This matters for example in following matrix multiply: int **mmult(int rows, int cols, int **m1, int **m2, int **m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 llvm-svn: 176403
* Forgot to 'svn add' the LoopVectorizer tests for the new parallel loop ↵Pekka Jaaskelainen2013-02-152-0/+166
| | | | | | metadata, sorry. llvm-svn: 175311
* Formatting.NAKAMURA Takumi2013-02-051-1/+1
| | | | llvm-svn: 174380
* llvm/test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll: "-debug" ↵NAKAMURA Takumi2013-02-051-0/+1
| | | | | | requires +Asserts. llvm-svn: 174379
* Loop Vectorizer: Handle pointer stores/loads in getWidestType()Arnold Schwaighofer2013-02-051-0/+149
| | | | | | | | | | | | | | | | | In the loop vectorizer cost model, we used to ignore stores/loads of a pointer type when computing the widest type within a loop. This meant that if we had only stores/loads of pointers in a loop we would return a widest type of 8bits (instead of 32 or 64 bit) and therefore a vector factor that was too big. Now, if we see a consecutive store/load of pointers we use the size of a pointer (from data layout). This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced test case is the first test in vector_ptr_load_store.ll). radar://13139343 llvm-svn: 174377
* Made the min-trip-count-switch test X86-specific to avoidPekka Jaaskelainen2013-01-311-0/+28
| | | | | | breakage with builds without X86-support. llvm-svn: 174052
* LoopVectorizer: Implement a new heuristics for selecting the unroll factor.Nadav Rotem2013-01-201-0/+71
| | | | | | | We ignore the cpu frontend and focus on pipeline utilization. We do this because we don't have a good way to estimate the loop body size at the IR level. llvm-svn: 172964
* Change the cpu type in the test.Nadav Rotem2013-01-201-1/+1
| | | | llvm-svn: 172963
* Move test that depends on the x86 target into a target-specific directory.Benjamin Kramer2013-01-161-0/+170
| | | | | | Should fix the arm buildbot (which only builds the arm target). llvm-svn: 172611
* ARM Cost model: Use the size of vector registers and widest vectorizable ↵Nadav Rotem2013-01-092-2/+2
| | | | | | instruction to determine the max vectorization factor. llvm-svn: 172010
* Remove the -licm pass from the loop vectorizer test because the loop ↵Nadav Rotem2013-01-091-2/+2
| | | | | | vectorizer does it now. llvm-svn: 171930
* Cost Model: Move the 'max unroll factor' variable to the TTI and add initial ↵Nadav Rotem2013-01-091-2/+0
| | | | | | Cost Model support on ARM. llvm-svn: 171928
* LoopVectorizer: When we vectorizer and widen loops we process many elements ↵Nadav Rotem2013-01-071-0/+50
| | | | | | | | | at once. This is a good thing, except for small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the rest of the loop. This patch disables widening of loops with a small static trip count. llvm-svn: 171798
* LoopVectorizer:Nadav Rotem2013-01-041-2/+19
| | | | | | | | 1. Add code to estimate register pressure. 2. Add code to select the unroll factor based on register pressure. 3. Add bits to TargetTransformInfo to provide the number of registers. llvm-svn: 171469
* Fix typo "Makre" -> "Make".Nick Lewycky2012-12-241-6/+4
| | | | llvm-svn: 171043
* LoopVectorizer: When checking for vectorizable types, also checkNadav Rotem2012-12-241-0/+29
| | | | | | | | the StoreInst operands. PR14705. llvm-svn: 171023
* CostModel: Change the default target-independent implementation for findingNadav Rotem2012-12-231-3/+3
| | | | | | | | the cost of arithmetic functions. We now assume that the cost of arithmetic operations that are marked as Legal or Promote is low, but ops that are marked as custom are higher. llvm-svn: 171002
* Loop Vectorizer: Update the cost model of scatter/gather operations and makeNadav Rotem2012-12-231-1/+4
| | | | | | them more expensive. llvm-svn: 170995
* Make TargetLowering::getTypeConversion more resilient against odd illegal MVTs.Benjamin Kramer2012-12-191-0/+22
| | | | | | | | | - An MVT can become an EVT when being split (e.g. v2i8 -> v1i8, the latter doesn't exist) - Return the scalar value when an MVT is scalarized (v1i64 -> i64) Fixes PR14639ff. llvm-svn: 170546
* Teach the cost model about the optimization in r169904: Truncation of ↵Nadav Rotem2012-12-131-1/+1
| | | | | | induction variables costs the same as scalar trunc. llvm-svn: 170051
* Cost Model: add tables for some avx type-conversion hacks.Nadav Rotem2012-11-061-1/+1
| | | | llvm-svn: 167480
* Code Model: Improve the accuracy of the zext/sext/trunc vector cost estimation.Nadav Rotem2012-11-051-2/+2
| | | | llvm-svn: 167412
* Implement the cost of abnormal x86 instruction lowering as a table.Nadav Rotem2012-11-051-1/+1
| | | | llvm-svn: 167395
* LoopVectorize: Preserve NSW, NUW and IsExact flags.Nadav Rotem2012-10-311-1/+1
| | | | llvm-svn: 167174
* Fix a bug in the cost calculation of vector casts. Detect situations where ↵Nadav Rotem2012-10-311-0/+48
| | | | | | bitcasts cost zero. llvm-svn: 167170
* Add support for loops that don't start with Zero.Nadav Rotem2012-10-311-0/+49
| | | | | | | This is important for loops in the LAPACK test-suite. These loops start at 1 because they are auto-converted from fortran. llvm-svn: 167084
* 1. Fix a bug in getTypeConversion. When a *simple* type is split, we need to ↵Nadav Rotem2012-10-271-0/+62
| | | | | | | | | return the type of the split result. 2. Change the maximum vectorization width from 4 to 8. 3. A test for both. llvm-svn: 166864
* Refactor the VectorTargetTransformInfo interface.Nadav Rotem2012-10-261-1/+1
| | | | | | | | | | Add getCostXXX calls for different families of opcodes, such as casts, arithmetic, cmp, etc. Port the LoopVectorizer to the new API. The LoopVectorizer now finds instructions which will remain uniform after vectorization. It uses this information when calculating the cost of these instructions. llvm-svn: 166836
* Move the target-specific tests, which require specific backends, to dirs ↵Nadav Rotem2012-10-262-0/+44
that only run if the target is present. llvm-svn: 166796
OpenPOWER on IntegriCloud