summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/SLPVectorizer
Commit message (Collapse)AuthorAgeFilesLines
...
* ARM cost model: Account for zero cost scalar SROA instructionsArnold Schwaighofer2013-10-291-0/+52
| | | | | | | | | By vectorizing a series of srl, or, ... instructions we have obfuscated the intention so much that the backend does not know how to fold this code away. radar://15336950 llvm-svn: 193573
* SLPVectorizer: Don't vectorize volatile memory operationsArnold Schwaighofer2013-10-161-0/+43
| | | | | | | | | | radar://15231682 Reapply r192799, http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang/builds/8226 showed that the bot is still broken even with this out. llvm-svn: 192820
* Revert "SLPVectorizer: Don't vectorize volatile memory operations"Arnold Schwaighofer2013-10-161-43/+0
| | | | | | This speculatively reverts commit 192799. It might have broken a linux buildbot. llvm-svn: 192816
* SLPVectorizer: Don't vectorize volatile memory operationsArnold Schwaighofer2013-10-161-0/+43
| | | | | | radar://15231682 llvm-svn: 192799
* SLPVectorizer: Sort PHINodes based on their opcodeArnold Schwaighofer2013-10-121-2/+34
| | | | | | | | | | | | | | Before this patch we relied on the order of phi nodes when we looked for phi nodes of the same type. This could prevent vectorization of cases where there was a phi node of a second type in between phi nodes of some type. This is important for vectorization of an internal graphics kernel. On the test suite + external on x86_64 (and on a run on armv7s) it showed no impact on either performance or compile time. radar://15024459 llvm-svn: 192537
* SLPVectorizer: Sort inputs to commutative binary operationsArnold Schwaighofer2013-10-041-0/+234
| | | | | | | | | | | | | | | | | | | | | | | Sort the operands of the other entries in the current vectorization root according to the first entry's operands opcodes. %conv0 = uitofp ... %load0 = load float ... = fmul %conv0, %load0 = fmul %load0, %conv1 = fmul %load0, %conv2 Make sure that we recursively vectorize <%conv0, %conv1, %conv2> and <%load0, %load0, %load0>. This makes it more likely to obtain vectorizable trees. We have to be careful when we sort that we don't destroy 'good' existing ordering implied by source order. radar://15080067 llvm-svn: 191977
* Apply slp vectorization on fully-vectorizable tree of height 2Yi Jiang2013-10-021-17/+130
| | | | llvm-svn: 191852
* SLPVectorizer: Make store chain finding more aggressive with ↵Benjamin Kramer2013-10-021-0/+21
| | | | | | | | | GetUnderlyingObject. This recursively strips all GEPs like the existing code. It also handles bitcasts and other operations that do not change the pointer value. llvm-svn: 191847
* TBAA: update tbaa format from scalar format to struct-path aware format.Manman Ren2013-09-301-4/+5
| | | | llvm-svn: 191690
* TBAA: remove !tbaa from testing cases when they are not needed.Manman Ren2013-09-302-13/+6
| | | | llvm-svn: 191689
* IRBuilder: Add RAII objects to reset insertion points or fast math flags.Benjamin Kramer2013-09-301-1/+1
| | | | | | | | Inspired by the object from the SLPVectorizer. This found a minor bug in the debug loc restoration in the vectorizer where the location of a following instruction was attached instead of the location from the original instruction. llvm-svn: 191673
* Fix SLPVectorizer using wrong address space for load/storeMatt Arsenault2013-09-272-0/+69
| | | | llvm-svn: 191564
* Transforms: Use getFirstNonPHI to set the insertion point for PHIsJustin Bogner2013-09-271-0/+31
| | | | | | | | | | We were previously using getFirstInsertionPt to insert PHI instructions when vectorizing, but getFirstInsertionPt also skips past landingpads, causing this to generate invalid IR. We can avoid this issue by using getFirstNonPHI instead. llvm-svn: 191526
* SLPVectorize: Put horizontal reductions feeding a store under separate flagArnold Schwaighofer2013-09-251-8/+10
| | | | | | | Put them under a separate flag for experimentation. They are more likely to interfere with loop vectorization which happens later in the pass pipeline. llvm-svn: 191371
* Test case for r191314. Yi Jiang2013-09-241-0/+27
| | | | | | Some supplemental information for r191314: We would like to make sure SLP Vectorizer will not try to vectorize tiny trees even with a negative threshold so we set the cost to INT_MAX. llvm-svn: 191327
* Reapply "SLPVectorizer: Handle more horizontal reductions (disabled)""Arnold Schwaighofer2013-09-211-0/+415
| | | | | | | | | | | | | | Reapply r191108 with a fix for a memory corruption error I introduced. Of course, we can't reference the scalars that we replace by vectorizing and then call their eraseFromParent method. I only 'needed' the scalars to get the DebugLoc. Just store the DebugLoc before actually vectorizing instead. As a nice side effect, this also simplifies the interface between BoUpSLP and the HorizontalReduction class to returning a value pointer (the vectorized tree root). radar://14607682 llvm-svn: 191123
* Revert "SLPVectorizer: Handle more horizontal reductions (disabled)"Arnold Schwaighofer2013-09-211-415/+0
| | | | | | | | | This reverts commit r191108. The horizontal.ll test case fails under libgmalloc. Thanks Shuxin for pointing this out to me. llvm-svn: 191121
* SLPVectorizer: Handle more horizontal reductions (disabled)Arnold Schwaighofer2013-09-201-0/+415
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Match reductions starting at binary operation feeding into a phi. The code handles trees like r += v1 + v2 + v3 ... and r += v1 r += v2 ... and r *= v1 + v2 + ... We currently only handle associative operations (add, fadd fast). The code can now also handle reductions feeding into stores. a[i] = v1 + v2 + v3 + ... The code is currently disabled behind the flag "-slp-vectorize-hor". The cost model for most architectures is not there yet. I found one opportunity of a horizontal reduction feeding a phi in TSVC (LoopRerolling-flt) and there are several opportunities where reductions feed into stores. radar://14607682 llvm-svn: 191108
* Name the XCore target-specific subdirectories canonically.Chandler Carruth2013-09-182-0/+0
| | | | llvm-svn: 190940
* A couple of tests, in llvm/test/Transforms/*/xcore, are XCore-specific. They ↵NAKAMURA Takumi2013-09-181-0/+3
| | | | | | should be excluded when XCore is not built. llvm-svn: 190938
* Prevent LoopVectorizer and SLPVectorizer running if the target has no vector ↵Robert Lytton2013-09-181-0/+24
| | | | | | | | | | registers. XCore target: Add XCoreTargetTransformInfo This is where getNumberOfRegisters() resides, which in turn returns the number of vector registers (=0). llvm-svn: 190936
* SLPVectorizer: Don't vectorize phi nodes that use invoke valuesArnold Schwaighofer2013-09-171-0/+62
| | | | | | | | | We can't insert an insertelement after an invoke. We would have to split a critical edge. So when we see a phi node that uses an invoke we just give up. radar://14990770 llvm-svn: 190871
* Debug Info Testing: updated to use NULL instead of "i32 0" in a few fields.Manman Ren2013-09-061-1/+1
| | | | | | | | Field 2 of DIType (Context), field 9 of DIDerivedType (TypeDerivedFrom), field 12 of DICompositeType (ContainingType), fields 2, 7, 12 of DISubprogram (Context, Type, ContainingType). llvm-svn: 190205
* In this patch we are trying to do two things:Yi Jiang2013-09-033-13/+141
| | | | | | | | | 1) If the width of vectorization list candidate is bigger than vector reg width, we will break it down to fit the vector reg. 2) We do not vectorize the width which is not power of two. The performance result shows it will help some spec benchmarks. mesa improved 6.97% and ammp improved 1.54%. llvm-svn: 189830
* Fix inserting instructions before last in bundle.Matt Arsenault2013-08-261-1/+1
| | | | | | | | | | | The builder inserts from before the insert point, not after, so this would insert before the last instruction in the bundle instead of after it. I'm not sure if this can actually be a problem with any of the current insertions. llvm-svn: 189285
* Debug Info: add an identifier field to DICompositeType.Manman Ren2013-08-261-1/+1
| | | | | | | | | | | | | | | | | | DICompositeType will have an identifier field at position 14. For now, the field is set to null in DIBuilder. For DICompositeTypes where the template argument field (the 13th field) was optional, modify DIBuilder to make sure the template argument field is set. Now DICompositeType has 15 fields. Update DIBuilder to use NULL instead of "i32 0" for null value of a MDNode. Update verifier to check that DICompositeType has 15 fields and the last field is null or a MDString. Update testing cases to include an extra field for DICompositeType. The identifier field will be used by type uniquing so a front end can genearte a DICompositeType with a unique identifer. llvm-svn: 189282
* Forgot to add slp threshold to testMatt Arsenault2013-08-261-1/+2
| | | | llvm-svn: 189248
* Vectorize starting from insertelements building a vectorMatt Arsenault2013-08-261-0/+196
| | | | llvm-svn: 189233
* [Debug Info Tests] Update testing cases.Manman Ren2013-08-221-6/+6
| | | | | | | | | A single metadata will not span multiple lines. This also helps me with my script to automatic update the testing cases. A debug info testing case should have a llvm.dbg.cu. Do not use hard-coded id for debug nodes. llvm-svn: 189033
* Teach the SLP vectorizer the correct way to check for consecutive accessChandler Carruth2013-08-221-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | using GEPs. Previously, it used a number of different heuristics for analyzing the GEPs. Several of these were conservatively correct, but failed to fall back to SCEV even when SCEV might have given a reasonable answer. One was simply incorrect in how it was formulated. There was good code already to recursively evaluate the constant offsets in GEPs, look through pointer casts, etc. I gathered this into a form code like the SLP code can use in a previous commit, which allows all of this code to become quite simple. There is some performance (compile time) concern here at first glance as we're directly attempting to walk both pointers constant GEP chains. However, a couple of thoughts: 1) The very common cases where there is a dynamic pointer, and a second pointer at a constant offset (usually a stride) from it, this code will actually not do any unnecessary work. 2) InstCombine and other passes work very hard to collapse constant GEPs, so it will be rare that we iterate here for a long time. That said, if there remain performance problems here, there are some obvious things that can improve the situation immensely. Doing a vectorizer-pass-wide memoizer for each individual layer of pointer values, their base values, and the constant offset is likely to be able to completely remove redundant work and strictly limit the scaling of the work to scrape these GEPs. Since this optimization was not done on the prior version (which would still benefit from it), I've not done it here. But if folks have benchmarks that slow down it should be straight forward for them to add. I've added a test case, but I'm not really confident of the amount of testing done for different access patterns, strides, and pointer manipulation. llvm-svn: 189007
* SLPVectorizer: Fix invalid iterator errorsArnold Schwaighofer2013-08-201-0/+30
| | | | | | | | | | | Update iterator when the SLP vectorizer changes the instructions in the basic block by restarting the traversal of the basic block. Patch by Yi Jiang! Fixes PR 16899. llvm-svn: 188832
* [tests] Cleanup initialization of test suffixes.Daniel Dunbar2013-08-162-3/+0
| | | | | | | | | | | | | | | | | - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513
* Fix PR16797 - Support PHINodes with multiple inputs from the same basic block.Nadav Rotem2013-08-121-0/+41
| | | | | | | Do not generate new vector values for the same entries because we know that the incoming values from the same block must be identical. llvm-svn: 188185
* SLPVectorizer: Fix PR16777. PHInodes may use multiple extracted values that ↵Nadav Rotem2013-08-021-0/+35
| | | | | | | | come from different blocks. Thanks Alexey Samsonov. llvm-svn: 187663
* Add the C source code to the test to make it easier to update when debug ↵Nadav Rotem2013-07-291-0/+9
| | | | | | | | info changes. Thanks Eric. llvm-svn: 187368
* SLPVectorier: update the debug location for the new instructions.Nadav Rotem2013-07-291-0/+82
| | | | llvm-svn: 187363
* Don't vectorize when the attribute NoImplicitFloat is used.Nadav Rotem2013-07-291-0/+25
| | | | llvm-svn: 187340
* SLP Vectorier: Don't vectorize really short chains because they are already ↵Nadav Rotem2013-07-261-1/+3
| | | | | | handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. llvm-svn: 187267
* SLP Vectorizer: Disable the vectorization of non power of two chains, such ↵Nadav Rotem2013-07-262-33/+39
| | | | | | as <3 x float>, because we dont have a good cost model for these types. llvm-svn: 187265
* When we vectorize across multiple basic blocks we may vectorize PHINodes ↵Nadav Rotem2013-07-221-0/+58
| | | | | | that create a cycle. We already break the cycle on phi-nodes, but arithmetic operations are still uplicated. This patch adds code that checks if the operation that we are vectorizing was vectorized during the visit of the operands and uses this value if it can. llvm-svn: 186883
* PR16628: Fix a bug in the code that merges compares.Nadav Rotem2013-07-151-0/+27
| | | | | | Compares return i1 but they compare different types. llvm-svn: 186359
* Update Transforms tests to use CHECK-LABEL for easier debugging. No ↵Stephen Lin2013-07-1414-19/+19
| | | | | | | | | | | | | | | | | | | | | | functionality change. This update was done with the following bash script: find test/Transforms -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP done mv $TEMP $NAME fi done llvm-svn: 186268
* SLPVectorizer: Sink and enable CSE for ExtractElements.Nadav Rotem2013-07-123-4/+4
| | | | llvm-svn: 186145
* SLPVectorize: Replace the code that checks for vectorization candidates in ↵Nadav Rotem2013-07-121-0/+74
| | | | | | | | successor blocks with code that scans PHINodes. Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler. llvm-svn: 186139
* Consolidate more lit tests.Nadav Rotem2013-07-113-62/+54
| | | | llvm-svn: 186063
* Consolidate some of the lit tests.Nadav Rotem2013-07-114-75/+57
| | | | llvm-svn: 186062
* Consolidate some of the lit tests.Nadav Rotem2013-07-115-61/+191
| | | | llvm-svn: 186060
* Fix PR16571, which is a bug in the code that checks that all of the types in ↵Nadav Rotem2013-07-091-0/+22
| | | | | | the bundle are uniform. llvm-svn: 185970
* SLPVectorizer: Implement DCE as part of vectorization.Nadav Rotem2013-07-0716-8/+645
| | | | | | | | | This is a complete re-write if the bottom-up vectorization class. Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization. There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design. In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree. llvm-svn: 185774
* SLP Vectorizer: Add support for trees with external users.Nadav Rotem2013-06-283-6/+117
| | | | | | | To support this we have to insert 'extractelement' instructions to pick the right lane. We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated. llvm-svn: 185230
OpenPOWER on IntegriCloud