summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize/runtime-check.ll
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() ↵Roman Lebedev2019-12-021-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR44100) rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two instructions to the case where the inner instruction would go away. Original commit message: > InstCombine: move hasOneUse check to the top of foldICmpAddConstant > > There were two combines not covered by the check before now, > neither of which actually differed from normal in the benefit analysis. > > The most recent seems to be because it was just added at the top of the > function (naturally). The older is from way back in 2008 (r46687) > when we just didn't put those checks in so routinely, and has been > diligently maintained since. From the commit message alone, there doesn't seem to be a deeper motivation, deeper problem that was trying to solve, other than 'fixing the wrong one-use check'. As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed. However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner `add` will not go away - that will still unchain the comparison from `add`, it will no longer need to wait for `add` to compute. Doing so doesn't seem to break any particular idioms, as least as far as i can see. References https://bugs.llvm.org/show_bug.cgi?id=44100
* [LV] Forced vectorization with runtime checks and OptForSizeSjoerd Meijer2019-09-241-1/+31
| | | | | | | | | | | | | | | When vectorisation is forced with a pragma, we optimise for min size, and we need to emit runtime memory checks, then allow this code growth and don't run in an assert like we currently do. This is the result of D65197 and D66803, and was a use-case not really considered before. If this now happens, we emit an optimisation remark warning about the code-size expansion, which can be avoided by not forcing vectorisation or possibly source-code modifications. Differential Revision: https://reviews.llvm.org/D67764 llvm-svn: 372694
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-171-0/+179
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-171-179/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) ↵Craig Topper2019-03-211-4/+2
| | | | | | | | | | | | | | if either zext or OP has another use. If they have other users we'll just end up increasing the instruction count. We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed. Fixes PR41164. Differential Revision: https://reviews.llvm.org/D59630 llvm-svn: 356690
* [LAA] Avoid generating RT checks for known deps preventing vectorization.Florian Hahn2018-12-201-2/+4
| | | | | | | | | | | | | | | | | | If we found unsafe dependences other than 'unknown', we already know at compile time that they are unsafe and the runtime checks should always fail. So we can avoid generating them in those cases. This should have no negative impact on performance as the runtime checks that would be created previously should always fail. As a sanity check, I measured the test-suite, spec2k and spec2k6 and there were no regressions. Reviewers: Ayal, anemet, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D55798 llvm-svn: 349794
* [LAA] Introduce enum for vectorization safety status (NFC).Florian Hahn2018-12-181-0/+40
| | | | | | | | | | | | | | This patch adds a VectorizationSafetyStatus enum, which will be extended in a follow up patch to distinguish between 'safe with runtime checks' and 'known unsafe' dependences. Reviewers: anemet, anna, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D54892 llvm-svn: 349556
* InstCombine: move hasOneUse check to the top of foldICmpAddConstantTim Northover2018-09-101-1/+1
| | | | | | | | | | | | There were two combines not covered by the check before now, neither of which actually differed from normal in the benefit analysis. The most recent seems to be because it was just added at the top of the function (naturally). The older is from way back in 2008 (r46687) when we just didn't put those checks in so routinely, and has been diligently maintained since. llvm-svn: 341831
* [InstCombine] Fold icmp ugt/ult (add nuw X, C2), C --> icmp ugt/ult X, (C - C2)Nicola Zaghen2018-09-041-1/+1
| | | | | | | | | | Support for sgt/slt was added in rL294898, this adds the same cases also for unsigned compares. This is the Alive proof: https://rise4fun.com/Alive/nyY Differential Revision: https://reviews.llvm.org/D50972 llvm-svn: 341353
* [LoopVectorize] regenerate full checks; NFCSanjay Patel2018-06-211-7/+61
| | | | llvm-svn: 335257
* [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.Shiva Chen2018-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841
* [LV] Test once if vector trip count is zero, instead of twiceAyal Zaks2017-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Generate a single test to decide if there are enough iterations to jump to the vectorized loop, or else go to the scalar remainder loop. This test compares the Scalar Trip Count: if STC < VF * UF go to the scalar loop. If requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the rest can be used to form vector iterations. So in this case the test checks instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the scalar loop if so. Otherwise the vector loop is entered for at-least one vector iteration. This test covers the case where incrementing the backedge-taken count will overflow leading to an incorrect trip count of zero. In this (rare) case we will also avoid the vector loop and jump to the scalar loop. This patch simplifies the existing tests and effectively removes the basic-block originally named "min.iters.checked", leaving the single test in block "vector.ph". Original observation and initial patch by Evgeny Stupachenko. Differential Revision: https://reviews.llvm.org/D34150 llvm-svn: 308421
* [LV] Remove triples from target-independent vectorizer tests. NFC.Michael Kuperstein2016-10-061-1/+0
| | | | | | | | | Vectorizer tests in the target-independent directory should not have a target triple. If a test really needs to query a specific backend, it belongs in the right target subdirectory (which "REQUIRES" the right backend). Otherwise, it should not specify a triple. llvm-svn: 283512
* [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.Adrian Prantl2016-04-151-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
* [DebugInfo] Fix tests so that each subprogram belongs to a CU.Davide Italiano2016-04-051-0/+8
| | | | llvm-svn: 265490
* [LV] Don't bail to MiddleBlock if a runtime check fails, bail to ScalarPH ↵James Molloy2015-09-021-2/+2
| | | | | | | | | | instead We were bailing to two places if our runtime checks failed. If the initial overflow check failed, we'd go to ScalarPH. If any other check failed, we'd go to MiddleBlock. This caused us to have to have an extra PHI per induction and reduction as the vector loop's exit block was not dominated by its latch. There's no need to have this behavior - if we just always go to ScalarPH we can get rid of a bunch of complexity. llvm-svn: 246637
* DI: Require subprogram definitions to be distinctDuncan P. N. Exon Smith2015-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' | grep -v test/Bitcode | xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327
* [LoopVectorize] Use ReplaceInstWithInst() helper where appropriate.Alexey Samsonov2015-07-011-15/+30
| | | | | | | | | | This is mostly an NFC, which increases code readability (instead of saving old terminator, generating new one in front of old, and deleting old, we just call a function). However, it would additionaly copy the debug location from old instruction to replacement, which would help PR23837. llvm-svn: 241197
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
* Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option ↵Sanjay Patel2014-09-101-1/+1
| | | | | | | | | | | names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528
* LoopVectorizer: If dependency checks fail try runtime checksArnold Schwaighofer2013-11-011-0/+28
| | | | | | | | | | | | When a dependence check fails we can still try to vectorize loops with runtime array bounds checks. This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the scalar performance on a corei7-avx. radar://15339680 llvm-svn: 193853
* Reapply 184685 after the SetVector iteration order fix.Arnold Schwaighofer2013-06-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg testers. "LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598" llvm-svn: 184724
* Revert "LoopVectorize: Use the dependence test utility class"Arnold Schwaighofer2013-06-241-1/+1
| | | | | | | | This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac. We are seeing a stage2 and stage3 miscompare on some dragonegg bots. llvm-svn: 184690
* LoopVectorize: Use the dependence test utility classArnold Schwaighofer2013-06-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598 llvm-svn: 184685
* TBAA: remove !tbaa from testing cases if not used.Manman Ren2013-04-301-6/+2
| | | | | | | This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180796
* LoopVectorizer: Emit memory checks into their own basic block.Benjamin Kramer2013-01-191-0/+4
| | | | | | | | | | | | | | This separates the check for "too few elements to run the vector loop" from the "memory overlap" check, giving a lot nicer code and allowing to skip the memory checks when we're not going to execute the vector code anyways. We still leave the decision of whether to emit the memory checks as branches or setccs, but it seems to be doing a good job. If ugly code pops up we may want to emit them as separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso. Most of this is legwork to allow multiple bypass blocks while updating PHIs, dominators and loop info. llvm-svn: 172902
* Remove the -licm pass from the loop vectorizer test because the loop ↵Nadav Rotem2013-01-091-1/+1
| | | | | | vectorizer does it now. llvm-svn: 171930
* Force a fixed unroll count on the target independent tests.Nadav Rotem2013-01-051-1/+1
| | | | | | This should fix clang-native-arm-cortex-a9. Thanks Renato. llvm-svn: 171582
* Add support for memory runtime check. When we can, we calculate array bounds.Nadav Rotem2012-11-091-0/+36
If the arrays are found to be disjoint then we run the vectorized version of the loop. If they are not, we run the scalar code. llvm-svn: 167608
OpenPOWER on IntegriCloud