summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
* LoopVectorizer: Change variable name Stride to ConsecutiveStrideArnold Schwaighofer2013-04-241-6/+6
| | | | | | | | This makes it easier to read the code. No functionality change. llvm-svn: 180197
* LoopVectorize: Scalarize padded typesArnold Schwaighofer2013-04-241-1/+9
| | | | | | | | | | | | | | | | | | This patch disables memory-instruction vectorization for types that need padding bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on x86_64. Because the load/store vectorization is performed by the bit casting to a packed vector, which has incompatible memory layout due to the lack of padding bytes, the present vectorizer produces inconsistent result for memory instructions of those types. This patch checks an equality of the AllocSize of a scalar type and allocated size for each vector element, to ensure that there is no padding bytes and the array can be read/written using vector operations. Patch by Daisuke Takahashi! Fixes PR15758. llvm-svn: 180196
* LoopVectorizer: Bail out if we don't have datalayout we need itArnold Schwaighofer2013-04-241-0/+5
| | | | llvm-svn: 180195
* LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure ↵Nadav Rotem2013-04-231-4/+4
| | | | | | | | that the order in which the elements are scalarized is the same as the original order. This fixes a miscompilation in FreeBSD's regex library. llvm-svn: 180121
* Call the potentially costly isAnnotatedParallel() only once. Pekka Jaaskelainen2013-04-231-3/+5
| | | | | | Made the uniform write test's checks a bit stricter. llvm-svn: 180119
* Refuse to (even try to) vectorize loops which have uniform writes,Pekka Jaaskelainen2013-04-231-9/+9
| | | | | | | | | even if erroneously annotated with the parallel loop metadata. Fixes Bug 15794: "Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata" llvm-svn: 180081
* Move C++ code out of the C headers and into either C++ headersEric Christopher2013-04-221-0/+1
| | | | | | | or the C++ files themselves. This enables people to use just a C compiler to interoperate with LLVM. llvm-svn: 180063
* SLPVectorize: Add support for vectorization of casts.Nadav Rotem2013-04-211-0/+69
| | | | llvm-svn: 179975
* SLPVectorizer: Fix a bug in the code that scans the tree in search of nodes ↵Nadav Rotem2013-04-211-0/+1
| | | | | | | | with multiple users. We did not terminate the switch case and we executed the search routine twice. llvm-svn: 179974
* Fix PR15800. Do not try to vectorize vectors and structs.Nadav Rotem2013-04-201-1/+10
| | | | llvm-svn: 179960
* VecUtils: Clean up uses of dyn_cast.Benjamin Kramer2013-04-201-4/+4
| | | | llvm-svn: 179936
* SLPVectorizer: Strength reduce SmallVectors to ArrayRefs.Benjamin Kramer2013-04-203-30/+28
| | | | | | Avoids a couple of copies and allows more flexibility in the clients. llvm-svn: 179935
* SLPVectorizer: Reduce the compile time by eliminating the search for some of ↵Nadav Rotem2013-04-201-1/+1
| | | | | | the more expensive patterns. After this change will only check basic arithmetic trees that start at cmpinstr. llvm-svn: 179933
* refactor tryToVectorizePair to a new method that supports vectorization of ↵Nadav Rotem2013-04-201-0/+8
| | | | | | lists. llvm-svn: 179932
* Fix an unused variable warning.Nadav Rotem2013-04-201-0/+1
| | | | llvm-svn: 179931
* SLPVectorizer: Improve the cost model for loop invariant broadcast values.Nadav Rotem2013-04-203-11/+28
| | | | llvm-svn: 179930
* Report the number of stores that were found in the debug message.Nadav Rotem2013-04-201-6/+8
| | | | llvm-svn: 179929
* Fix the header comment.Nadav Rotem2013-04-202-2/+2
| | | | llvm-svn: 179928
* Use 64bit arithmetic for calculating distance between pointers.Nadav Rotem2013-04-201-2/+2
| | | | llvm-svn: 179927
* LoopVectorizer: Use matcher from PatternMatch.h for the min/max patternsArnold Schwaighofer2013-04-191-104/+102
| | | | | | | | | Also make some static function class functions to avoid having to mention the class namespace for enums all the time. No functionality change intended. llvm-svn: 179886
* Fix a -Wdocumentation warningDmitri Gribenko2013-04-181-1/+1
| | | | llvm-svn: 179789
* LoopVectorizer: Recognize min/max reductionsArnold Schwaighofer2013-04-181-34/+209
| | | | | | | | | | | | A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y) sequence in LLVM. If we see such a sequence we can treat it just as any other commutative binary instruction and reduce it. This appears to help bzip2 by about 1.5% on an imac12,2. radar://12960601 llvm-svn: 179773
* LoopVectorize: Use a set to avoid longer cycles in the reduction chain too.Benjamin Kramer2013-04-181-8/+6
| | | | | | Fixes PR15748. llvm-svn: 179757
* SLPVectorizer: Make it a function pass and add code for hoisting the ↵Nadav Rotem2013-04-153-159/+254
| | | | | | vector-gather sequence out of loops. llvm-svn: 179562
* SLPVectorizer: Add support for vectorizing trees that start at compare ↵Nadav Rotem2013-04-151-21/+40
| | | | | | instructions. llvm-svn: 179504
* Miscellaneous cleanups for VecUtils.hBenjamin Kramer2013-04-141-9/+6
| | | | llvm-svn: 179483
* SLP: Document the scalarization cost method.Nadav Rotem2013-04-141-3/+10
| | | | llvm-svn: 179479
* SLPVectorizer: Add support for trees that don't start at binary operators, ↵Nadav Rotem2013-04-143-7/+25
| | | | | | and add the cost of extracting values from the roots of the tree. llvm-svn: 179475
* SLPVectorizer: add initial support for reduction variable vectorization.Nadav Rotem2013-04-143-7/+95
| | | | llvm-svn: 179470
* SLPVectorizer: add support for vectorization of diamond shaped trees. We now ↵Nadav Rotem2013-04-122-46/+254
| | | | | | perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from. llvm-svn: 179414
* Add debug prints.Nadav Rotem2013-04-121-1/+5
| | | | llvm-svn: 179412
* LoopVectorizer: integer division is not a reduction operationArnold Schwaighofer2013-04-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't classify idiv/udiv as a reduction operation. Integer division is lossy. For example : (1 / 2) * 4 != 4/2. Example: int a[] = { 2, 5, 2, 2} int x = 80; for() x /= a[i]; Scalar: x /= 2 // = 40 x /= 5 // = 8 x /= 2 // = 4 x /= 2 // = 2 Vectorized: <80, 1> / <2,5> //= <40,0> <40, 0> / <2,2> //= <20,0> 20*0 = 0 radar://13640654 llvm-svn: 179381
* Rename the C function to create a SLPVectorizerPass to something sane and ↵Benjamin Kramer2013-04-111-2/+2
| | | | | | expose it in the header file. llvm-svn: 179272
* Make the SLP store-merger less paranoid about function calls. We check for ↵Nadav Rotem2013-04-101-4/+0
| | | | | | function calls when we check if it is safe to sink instructions. llvm-svn: 179207
* We require DataLayout for analyzing the size of stores.Nadav Rotem2013-04-102-1/+6
| | | | llvm-svn: 179206
* Add support for bottom-up SLP vectorization infrastructure.Nadav Rotem2013-04-095-0/+707
| | | | | | | | | | | | | | | | | | | | | | This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations. The infrastructure has three potential users: 1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]). 2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute. 3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization. This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code: void SAXPY(int *x, int *y, int a, int i) { x[i] = a * x[i] + y[i]; x[i+1] = a * x[i+1] + y[i+1]; x[i+2] = a * x[i+2] + y[i+2]; x[i+3] = a * x[i+3] + y[i+3]; } llvm-svn: 179117
* LoopVectorizer: Pass OperandValueKind information to the cost modelArnold Schwaighofer2013-04-041-2/+13
| | | | | | | | | | | | Pass down the fact that an operand is going to be a vector of constants. This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86 back. It had degraded to scalar performance due to my pervious shift cost change that made all shifts expensive on x86. radar://13576547 llvm-svn: 178809
* LoopVectorize: Invert case when we use a vector cmp value to query select costArnold Schwaighofer2013-03-141-1/+1
| | | | | | | We generate a select with a vectorized condition argument when the condition is NOT loop invariant. Not the other way around. llvm-svn: 177098
* BBVectorize: Fixup debugging statementsHal Finkel2013-03-101-2/+2
| | | | | | | After the recent data-structure improvements, a couple of debugging statements were broken (printing pointer values). llvm-svn: 176791
* Remove a source of nondeterminism from the LoopVectorizer.Benjamin Kramer2013-03-091-1/+1
| | | | | | | This made us emit runtime checks in a random order. Hopefully bootstrap miscompares will go away now. llvm-svn: 176775
* LoopVectorizer: Ignore all dbg intrinisicArnold Schwaighofer2013-03-091-6/+6
| | | | | | Ignore all DbgIntriniscInfo instructions instead of just DbgValueInst. llvm-svn: 176769
* LoopVectorizer: Ignore dbg.value instructionsArnold Schwaighofer2013-03-091-2/+11
| | | | | | | | | We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic and don't transfer them to the vectorized code. radar://13378964 llvm-svn: 176768
* Insert the reduction start value into the first bypass block to preserve ↵Benjamin Kramer2013-03-081-1/+1
| | | | | | | | domination. Fixes PR15344. llvm-svn: 176701
* PR14448 - prevent the loop vectorizer from vectorizing the same loop twice.Nadav Rotem2013-03-021-0/+18
| | | | | | | | | | The LoopVectorizer often runs multiple times on the same function due to inlining. When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches. With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time. PR14448. llvm-svn: 176399
* LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses.Benjamin Kramer2013-03-011-1/+8
| | | | | | Fixes PR15384. llvm-svn: 176366
* LoopVectorize: Vectorize math builtin calls.Benjamin Kramer2013-02-271-50/+137
| | | | | | | | | | | This properly asks TargetLibraryInfo if a call is available and if it is, it can be translated into the corresponding LLVM builtin. We don't vectorize sqrt() yet because I'm not sure about the semantics for negative numbers. The other intrinsic should be exact equivalents to the libm functions. Differential Revision: http://llvm-reviews.chandlerc.com/D465 llvm-svn: 176188
* Allow GlobalValues to vectorize with AliasAnalysisRenato Golin2013-02-211-35/+154
| | | | | | | | | | | | | | | | | | | | | Storing the load/store instructions with the values and inspect them using Alias Analysis to make sure they don't alias, since the GEP pointer operand doesn't take the offset into account. Trying hard to not add any extra cost to loads and stores that don't overlap on global values, AA is *only* calculated if all of the previous attempts failed. Using biggest vector register size as the stride for the vectorization access, as we're being conservative and the cost model (which calculates the real vectorization factor) is only run after the legalization phase. We might re-think this relationship in the future, but for now, I'd rather be safe than sorry. llvm-svn: 175818
* BBVectorize: Fix an invalid reference bugHal Finkel2013-02-171-4/+7
| | | | | | | | | | | | | | This fixes PR15289. This bug was introduced (recently) in r175215; collecting all std::vector references for candidate pairs to delete at once is invalid because subsequent lookups in the owning DenseMap could invalidate the references. bugpoint was able to reduce a useful test case. Unfortunately, because whether or not this asserts depends on memory layout, this test case will sometimes appear to produce valid output. Nevertheless, running under valgrind will reveal the error. llvm-svn: 175397
* BBVectorize: Call a DAG and DAG instead of a treeHal Finkel2013-02-151-84/+84
| | | | | | | | | | Several functions and variable names used the term 'tree' to refer to what is actually a DAG. Correcting this mistake will, hopefully, prevent confusion in the future. No functionality change intended. llvm-svn: 175278
* BBVectorize: Cap the number of candidate pairs in each instruction groupHal Finkel2013-02-151-1/+9
| | | | | | | | | | | | | | | | | | | | | | | For some basic blocks, it is possible to generate many candidate pairs for relatively few pairable instructions. When many (tens of thousands) of these pairs are generated for a single instruction group, the time taken to generate and rank the different vectorization plans can become quite large. As a result, we now cap the number of candidate pairs within each instruction group. This is done by closing out the group once the threshold is reached (set now at 3000 pairs). Although this will limit the overall compile-time impact, this may not be the best way to achieve this result. It might be better, for example, to prune excessive candidate pairs after the fact the prevent the generation of short, but highly-connected groups. We can experiment with this in the future. This change reduces the overall compile-time slowdown of the csa.ll test case in PR15222 to ~5x. If 5x is still considered too large, a lower limit can be used as the default. This represents a functionality change, but only for very large inputs (thus, there is no regression test). llvm-svn: 175251
OpenPOWER on IntegriCloud