summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
* InstCombine: Teach most integer add/sub/mul/div combines how to deal with ↵Benjamin Kramer2014-01-191-4/+4
| | | | | | vectors. llvm-svn: 199602
* LoopVectorizer: A reduction that has multiple uses of the reduction value is notArnold Schwaighofer2014-01-191-0/+42
| | | | | | | | | | | | | | a reduction. Really. Under certain circumstances (the use list of an instruction has to be set up right - hence the extra pass in the test case) we would not recognize when a value in a potential reduction cycle was used multiple times by the reduction cycle. Fixes PR18526. radar://15851149 llvm-svn: 199570
* LoopVectorize: Only strip casts from integer types when replacing symbolicArnold Schwaighofer2014-01-151-0/+37
| | | | | | | | strides Fixes PR18480. llvm-svn: 199291
* Fix broken CHECK lines.Benjamin Kramer2014-01-111-2/+2
| | | | llvm-svn: 199016
* LoopVectorizer: Handle strided memory accesses by versioningArnold Schwaighofer2014-01-102-5/+57
| | | | | | | | | | | | | | | for (i = 0; i < N; ++i) A[i * Stride1] += B[i * Stride2]; We take loops like this and check that the symbolic strides 'Strided1/2' are one and drop to the scalar loop if they are not. This is currently disabled by default and hidden behind the flag 'enable-mem-access-versioning'. radar://13075509 llvm-svn: 198950
* LoopVectorizer: Don't if-convert constant expressions that can trapArnold Schwaighofer2013-12-171-0/+63
| | | | | | | | | | A phi node operand or an instruction operand could be a constant expression that can trap (division). Check that we don't vectorize such cases. PR16729 radar://15653590 llvm-svn: 197449
* force vector width via cpu on vectorizer metadata enableRenato Golin2013-12-071-11/+11
| | | | llvm-svn: 196669
* Move test to X86 dirRenato Golin2013-12-051-0/+0
| | | | | | | Test is platform independent, but I don't want to force vector-width, or that could spoil the pragma test. llvm-svn: 196539
* Add #pragma vectorize enable/disable to LLVMRenato Golin2013-12-051-0/+175
| | | | | | | | | | | | | | | | | | | | | | | | The intended behaviour is to force vectorization on the presence of the flag (either turn on or off), and to continue the behaviour as expected in its absence. Tests were added to make sure the all cases are covered in opt. No tests were added in other tools with the assumption that they should use the PassManagerBuilder in the same way. This patch also removes the outdated -late-vectorize flag, which was on by default and not helping much. The pragma metadata is being attached to the same place as other loop metadata, but nothing forbids one from attaching it to a function (to enable #pragma optimize) or basic blocks (to hint the basic-block vectorizers), etc. The logic should be the same all around. Patches to Clang to produce the metadata will be produced after the initial implementation is agreed upon and committed. Patches to other vectorizers (such as SLP and BB) will be added once we're happy with the pass manager changes. llvm-svn: 196537
* Correct word hyphenationsAlp Toker2013-12-051-1/+1
| | | | | | | This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471
* opt: Mirror vectorization presets of clangArnold Schwaighofer2013-12-031-0/+28
| | | | | | | | | | clang enables vectorization at optimization levels > 1 and size level < 2. opt should behave similarily. Loop vectorization and SLP vectorization can be disabled with the flags -disable-(loop/slp)-vectorization. llvm-svn: 196294
* LoopVectorizer: Truncate i64 trip counts of i32 phis if necessaryArnold Schwaighofer2013-11-261-0/+39
| | | | | | | | | | | In signed arithmetic we could end up with an i64 trip count for an i32 phi. Because it is signed arithmetic we know that this is only defined if the i32 does not wrap. It is therefore safe to truncate the i64 trip count to a i32 value. Fixes PR18049. llvm-svn: 195787
* Debug Info: update testing cases to specify the debug info version number.Manman Ren2013-11-222-1/+4
| | | | | | | | We are going to drop debug info without a version number or with a different version number, to make sure we don't crash when we see bitcode files with different debug info metadata format. llvm-svn: 195504
* SLPVectorizer: Fix stale for Value pointer arrayArnold Schwaighofer2013-11-191-0/+33
| | | | | | | | | | | | | | | We are slicing an array of Value pointers and process those slices in a loop. The problem is that we might invalidate a later slice by vectorizing a former slice. Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can tell. The test case will only fail when running with libgmalloc. radar://15498655 llvm-svn: 195162
* LoopVectorizer: Extend the induction variable to a larger typeArnold Schwaighofer2013-11-181-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | In some case the loop exit count computation can overflow. Extend the type to prevent most of those cases. The problem is loops like: int main () { int a = 1; char b = 0; lbl: a &= 4; b--; if (b) goto lbl; return a; } The backedge count is 255. The induction variable type is i8. If we add one to 255 to get the exit count we overflow to zero. To work around this issue we extend the type of the induction variable to i32 in the case of i8 and i16. PR17532 llvm-svn: 195008
* LoopVectorizer: Use abi alignment for accesses with no alignmentArnold Schwaighofer2013-11-151-0/+33
| | | | | | | | | | | When we vectorize a scalar access with no alignment specified, we have to set the target's abi alignment of the scalar access on the vectorized access. Using the same alignment of zero would be wrong because most targets will have a bigger abi alignment for vector types. This probably fixes PR17878. llvm-svn: 194876
* Scalarize select vector arguments when extracted.Matt Arsenault2013-11-041-29/+29
| | | | | | | | When the elements are extracted from a select on vectors or a vector select, do the select on the extracted scalars from the input if there is only one use. llvm-svn: 194013
* LoopVectorizer: Perform redundancy elimination on induction variablesArnold Schwaighofer2013-11-013-5/+42
| | | | | | | | | | | | | | | | | | | | When the loop vectorizer was part of the SCC inliner pass manager gvn would run after the loop vectorizer followed by instcombine. This way redundancy (multiple uses) were removed and instcombine could perform scalarization on the induction variables. Having moved the loop vectorizer to later we no longer run any form of redundancy elimination before we perform instcombine. This caused vectorized induction variables to survive that did not before. On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops. This should also help lpbench and paq8p. I ran a Release (without Asserts) build over the test-suite and did not see any negative impact on compile time. radar://15339680 llvm-svn: 193891
* LoopVectorize: Look for consecutive acces in GEPs with trailing zero indicesBenjamin Kramer2013-11-011-0/+42
| | | | | | | If we have a pointer to a single-element struct we can still build wide loads and stores to it (if there is no padding). llvm-svn: 193860
* LoopVectorizer: If dependency checks fail try runtime checksArnold Schwaighofer2013-11-011-0/+28
| | | | | | | | | | | | When a dependence check fails we can still try to vectorize loops with runtime array bounds checks. This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the scalar performance on a corei7-avx. radar://15339680 llvm-svn: 193853
* ARM cost model: Unaligned vectorized double stores are expensiveArnold Schwaighofer2013-10-291-9/+9
| | | | | | | | | Updated a test case that assumed that <2 x double> would vectorize to use <4 x float>. radar://15338229 llvm-svn: 193574
* LoopVectorizer: Don't attempt to vectorize extractelement instructionsHal Finkel2013-10-251-0/+35
| | | | | | | | | | | | | | | The loop vectorizer does not currently understand how to vectorize extractelement instructions. The existing check, which excluded all vector-valued instructions, did not catch extractelement instructions because it checked only the return value. As a result, vectorization would proceed, producing illegal instructions like this: %58 = extractelement <2 x i32> %15, i32 0 %59 = extractelement i32 %58, i32 0 where the second extractelement is illegal because its first operand is not a vector. llvm-svn: 193434
* I had to move and removeRenato Golin2013-10-241-46/+0
| | | | llvm-svn: 193355
* Fix broken builds by moving test to x86 dirRenato Golin2013-10-241-0/+46
| | | | llvm-svn: 193351
* Mark vector loops as already vectorizedRenato Golin2013-10-242-2/+49
| | | | | | | | Make sure we mark all loops (scalar and vector) when vectorizing, so that we don't try to vectorize them anymore. Also, set unroll to 1, since this is what we check for on early exit. llvm-svn: 193349
* LoopVectorize: External uses must use the last value in a reduction cycleArnold Schwaighofer2013-10-071-0/+27
| | | | | | | | | Otherwise, we don't perform operations that would have been performed on the scalar version. Fixes PR17498. llvm-svn: 192133
* Don't use runtime bounds check between address spaces.Matt Arsenault2013-10-022-0/+377
| | | | | | | | | Don't vectorize with a runtime check if it requires a comparison between pointers with different address spaces. The values can't be assumed to be directly comparable. Previously it would create an illegal bitcast. llvm-svn: 191862
* Fix missing CHECK-LABELsMatt Arsenault2013-10-0217-33/+33
| | | | llvm-svn: 191853
* TBAA: update tbaa format from scalar format to struct-path aware format.Manman Ren2013-09-301-10/+11
| | | | llvm-svn: 191690
* TBAA: remove !tbaa from testing cases when they are not needed.Manman Ren2013-09-302-9/+3
| | | | llvm-svn: 191689
* Revert "LoopVectorizer: Only allow vectorization of intrinsics."Arnold Schwaighofer2013-09-231-20/+53
| | | | | | | | | | | | | | Revert 191122 - with extra checks we are allowed to vectorize math library function calls. Standard library indentifiers are reserved names so functions with external linkage must not overrided them. However, functions with internal linkage can. Therefore, we can vectorize calls to math library functions with a check for external linkage and matching signature. This matches what we do during SelectionDAG building. llvm-svn: 191206
* LoopVectorizer: Only allow vectorization of intrinsics. We can't know for ↵Nadav Rotem2013-09-211-3/+28
| | | | | | | | sure that the functions 'abs' or 'round' are the functions from libm. rdar://15012650 llvm-svn: 191122
* Name the XCore target-specific subdirectories canonically.Chandler Carruth2013-09-182-0/+0
| | | | llvm-svn: 190940
* A couple of tests, in llvm/test/Transforms/*/xcore, are XCore-specific. They ↵NAKAMURA Takumi2013-09-181-0/+3
| | | | | | should be excluded when XCore is not built. llvm-svn: 190938
* Prevent LoopVectorizer and SLPVectorizer running if the target has no vector ↵Robert Lytton2013-09-181-0/+23
| | | | | | | | | | registers. XCore target: Add XCoreTargetTransformInfo This is where getNumberOfRegisters() resides, which in turn returns the number of vector registers (=0). llvm-svn: 190936
* Don't vectorize if there are outside loop users of the induction variable.Arnold Schwaighofer2013-09-161-0/+30
| | | | | | | | | | | | We would have to compute the pre increment value, either by computing it on every loop iteration or by splitting the edge out of the loop and inserting a computation for it there. For now, just give up vectorizing such loops. Fixes PR17179. llvm-svn: 190790
* Fix missing CHECK-LABELsMatt Arsenault2013-09-101-5/+5
| | | | llvm-svn: 190426
* Debug Info Testing: update context from empty string to null.Manman Ren2013-09-081-2/+2
| | | | | | Context should be either null or MDNode. llvm-svn: 190267
* Debug Info Testing: updated to use NULL instead of "i32 0" in a few fields.Manman Ren2013-09-062-3/+3
| | | | | | | | Field 2 of DIType (Context), field 9 of DIDerivedType (TypeDerivedFrom), field 12 of DICompositeType (ContainingType), fields 2, 7, 12 of DISubprogram (Context, Type, ContainingType). llvm-svn: 190205
* Disable unrolling in the loop vectorizer when disabled in the pass managerHal Finkel2013-08-282-1/+32
| | | | | | | | | | | | | | | | | When unrolling is disabled in the pass manager, the loop vectorizer should also not unroll loops. This will allow the -fno-unroll-loops option in Clang to behave as expected (even for vectorizable loops). The loop vectorizer's -force-vector-unroll option will (continue to) override the pass-manager setting (including -force-vector-unroll=0 to force use of the internal auto-selection logic). In order to test this, I added a flag to opt (-disable-loop-unrolling) to force disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also, this fixes a small bug in opt where the loop vectorizer was enabled only after the pass manager populated the queue of passes (the global_alias.ll test needed a slight update to the RUN line as a result of this fix). llvm-svn: 189499
* Debug Info: add an identifier field to DICompositeType.Manman Ren2013-08-262-3/+3
| | | | | | | | | | | | | | | | | | DICompositeType will have an identifier field at position 14. For now, the field is set to null in DIBuilder. For DICompositeTypes where the template argument field (the 13th field) was optional, modify DIBuilder to make sure the template argument field is set. Now DICompositeType has 15 fields. Update DIBuilder to use NULL instead of "i32 0" for null value of a MDNode. Update verifier to check that DICompositeType has 15 fields and the last field is null or a MDString. Update testing cases to include an extra field for DICompositeType. The identifier field will be used by type uniquing so a front end can genearte a DICompositeType with a unique identifer. llvm-svn: 189282
* LoopVectorize: Implement partial loop unrolling when vectorization is not ↵Nadav Rotem2013-08-261-0/+39
| | | | | | | | | | | | | | | | profitable. This patch enables unrolling of loops when vectorization is legal but not profitable. We add a new class InnerLoopUnroller, that extends InnerLoopVectorizer and replaces some of the vector-specific logic with scalars. This patch does not introduce any runtime regressions and improves the following workloads: SingleSource/Benchmarks/Shootout/matrix -22.64% SingleSource/Benchmarks/Shootout-C++/matrix -13.06% External/SPEC/CINT2006/464_h264ref/464_h264ref -3.99% SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -1.95% llvm-svn: 189281
* [Debug Info Tests] Update testing cases.Manman Ren2013-08-221-9/+9
| | | | | | | | | A single metadata will not span multiple lines. This also helps me with my script to automatic update the testing cases. A debug info testing case should have a llvm.dbg.cu. Do not use hard-coded id for debug nodes. llvm-svn: 189033
* Teach LoopVectorize about address space sizesMatt Arsenault2013-08-221-2/+29
| | | | llvm-svn: 188980
* Add a llvm.copysign intrinsicHal Finkel2013-08-191-0/+53
| | | | | | | | | | | | | | | | | | | | | This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728
* [tests] Cleanup initialization of test suffixes.Daniel Dunbar2013-08-163-5/+0
| | | | | | | | | | | | | | | | | - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513
* Add ISD::FROUND for libm round()Hal Finkel2013-08-071-0/+52
| | | | | | | | | | | | | | | All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well. For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround). This will be used by the PowerPC backend in a follow-up commit. llvm-svn: 187926
* LoopVectorize: Allow vectorization of loops with lifetime markersArnold Schwaighofer2013-08-061-0/+96
| | | | | | Patch by Marc Jessome! llvm-svn: 187825
* Use function attributes to indicate that we don't want to realign the stack.Bill Wendling2013-08-011-1/+1
| | | | | | | | Function attributes are the future! So just query whether we want to realign the stack directly from the function instead of through a random target options structure. llvm-svn: 187618
* Debug Info: update testing cases to pass verifier.Manman Ren2013-07-291-7/+8
| | | | llvm-svn: 187362
OpenPOWER on IntegriCloud