summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/SimplifyCFG/SpeculativeExec.ll
Commit message (Collapse)AuthorAgeFilesLines
* [SimplifyCFG] add tests for possible FP speculative select; NFCSanjay Patel2019-11-171-36/+99
| | | | | | It doesn't seem that there are any perf/param knobs that can be turned to create selects for the FP variants of the tests, but that may not always be true in the future. If it changes, we should propagate FMF.
* [SimplifyCFG] FoldTwoEntryPHINode(): consider *total* speculation cost, not ↵Roman Lebedev2019-09-161-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | per-BB cost Summary: Previously, if the threshold was 2, we were willing to speculatively execute 2 cheap instructions in both basic blocks (thus we were willing to speculatively execute cost = 4), but weren't willing to speculate when one BB had 3 instructions and other one had no instructions, even thought that would have total cost of 3. This looks inconsistent to me. I don't think `cmov`-like instructions will start executing until both of it's inputs are available: https://godbolt.org/z/zgHePf So i don't see why the existing behavior is the correct one. Also, let's add it's own `cl::opt` for this threshold, with default=4, so it is not stricter than the previous threshold: will allow to fold when there are 2 BB's each with cost=2. And since the logic has changed, it will also allow to fold when one BB has cost=3 and other cost=1, or there is only one BB with cost=4. This is an alternative solution to D65148: This fix is mainly motivated by `signbit-like-value-extension.ll` test. That pattern comes up in JPEG decoding, see e.g. `Figure F.12 – Extending the sign bit of a decoded value in V` of `ITU T.81` (JPEG specification). That branch is not predictable, and it is within the innermost loop, so the fact that that pattern ends up being stuck with a branch instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial. This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240) | metric | old | new | delta | % | | x86-mi-counting.NumMachineFunctions | 37720 | 37721 | 1 | 0.00% | | x86-mi-counting.NumMachineBasicBlocks | 773545 | 771181 | -2364 | -0.31% | | x86-mi-counting.NumMachineInstructions | 7488843 | 7486442 | -2401 | -0.03% | | x86-mi-counting.NumUncondBR | 135770 | 135543 | -227 | -0.17% | | x86-mi-counting.NumCondBR | 423753 | 422187 | -1566 | -0.37% | | x86-mi-counting.NumCMOV | 24815 | 25731 | 916 | 3.69% | | x86-mi-counting.NumVecBlend | 17 | 17 | 0 | 0.00% | We significantly decrease basic block count, notably decrease instruction count, significantly decrease branch count and very significantly increase `cmov` count. Performance-wise, unsurprisingly, this has great effect on target RawSpeed benchmark. I'm seeing 5 **major** improvements: ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean -0.3064 -0.3064 226.9913 157.4452 226.9800 157.4384 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median -0.3057 -0.3057 226.8407 157.4926 226.8282 157.4828 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev -0.4985 -0.4954 0.3051 0.1530 0.3040 0.1534 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean -0.1747 -0.1747 80.4787 66.4227 80.4771 66.4146 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median -0.1742 -0.1743 80.4686 66.4542 80.4690 66.4436 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev +0.6089 +0.5797 0.0670 0.1078 0.0673 0.1062 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean -0.1598 -0.1598 171.6996 144.2575 171.6915 144.2538 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median -0.1598 -0.1597 171.7109 144.2755 171.7018 144.2766 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev +0.4024 +0.3850 0.0847 0.1187 0.0848 0.1175 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean -0.0550 -0.0551 280.3046 264.8800 280.3017 264.8559 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median -0.0554 -0.0554 280.2628 264.7360 280.2574 264.7297 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev +0.7005 +0.7041 0.2779 0.4725 0.2775 0.4729 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean -0.0354 -0.0355 316.7396 305.5208 316.7342 305.4890 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median -0.0354 -0.0356 316.6969 305.4798 316.6917 305.4324 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev +0.0493 +0.0330 0.3562 0.3737 0.3563 0.3681 ``` That being said, it's always best-effort, so there will likely be cases where this worsens things. Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67318 llvm-svn: 372009
* [SimplifyCFG][NFC] Autogenerate two testsRoman Lebedev2019-09-071-4/+4
| | | | llvm-svn: 371310
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-171-0/+121
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-171-121/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [SimplifyCFG] Regenerate some test cases using update_test_checks.py to ↵Craig Topper2017-10-311-40/+64
| | | | | | | | prepare for an upcoming commit. NFC A future commit will change how some of the value names in the IR are generated which causes these tests to break in their current form. The script generates checks with regular expressions so it should be immune. llvm-svn: 317023
* Revert "Revert "Strip metadata when speculatively hoisting instructions ↵Igor Laevsky2015-11-181-0/+26
| | | | | | | | (r252604)" Failing clang test is now fixed by the r253458. llvm-svn: 253459
* Revert "Strip metadata when speculatively hoisting instructions"Renato Golin2015-11-101-26/+0
| | | | | | | This reverts commit r252604, as it broke all ARM and AArch64 buildbots, as well as some x86, et al. llvm-svn: 252623
* Strip metadata when speculatively hoisting instructionsIgor Laevsky2015-11-101-0/+26
| | | | | | | | | | | | | | | | This is fix for PR24059. When we are hoisting instruction above some condition it may turn out that metadata on this instruction was control dependant on the condition. This metadata becomes invalid and we need to drop it. This patch should cover most obvious places of speculative execution (which I have found by greping isSafeToSpeculativelyExecute). I think there are more cases but at least this change covers the severe ones. Differential Revision: http://reviews.llvm.org/D14398 llvm-svn: 252604
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* [TTI] Improved cost heuristic for cttz/ctlz calls.Andrea Di Biagio2015-02-111-16/+0
| | | | | | | | | | | | | | | | | This patch is a follow-up of r228826 (see code-review: D7506). Now that SimplifyCFG uses TargetTransformInfo for cost analysis, we have to fix the cost heuristic for intrinsic calls to cttz/ctlz. This patch defines method 'getIntrinsicCost' in BasicTTIImpl: now, BasicTTIImpl queries TLI to check if a call to cttz/ctlz is cheap for the target. Added test cases in Transforms/SimplifyCFG/X86 to verify that on x86, SimplifyCFG only speculates a call to cttz/ctlz if it is cheap. Differential Revision: http://reviews.llvm.org/D7554 llvm-svn: 228829
* Update Transforms tests to use CHECK-LABEL for easier debugging. No ↵Stephen Lin2013-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | functionality change. This update was done with the following bash script: find test/Transforms -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP done mv $TEMP $NAME fi done llvm-svn: 186268
* Re-revert r173342, without losing the compile time improvements, flatChandler Carruth2013-01-271-115/+0
| | | | | | out bug fixes, or functionality preserving refactorings. llvm-svn: 173610
* Switch this code away from Value::isUsedInBasicBlock. That code eitherChandler Carruth2013-01-251-0/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | loops over instructions in the basic block or the use-def list of the value, neither of which are really efficient when repeatedly querying about values in the same basic block. What's more, we already know that the CondBB is small, and so we can do a much more efficient test by counting the uses in CondBB, and seeing if those account for all of the uses. Finally, we shouldn't blanket fail on any such instruction, instead we should conservatively assume that those instructions are part of the cost. Note that this actually fixes a bug in the pass because isUsedInBasicBlock has a really terrible bug in it. I'll fix that in my next commit, but the fix for it would make this code suddenly take the compile time hit I thought it already was taking, so I wanted to go ahead and migrate this code to a faster & better pattern. The bug in isUsedInBasicBlock was also causing other tests to test the wrong thing entirely: for example we weren't actually disabling speculation for floating point operations as intended (and tested), but the test passed because we failed to speculate them due to the isUsedInBasicBlock failure. llvm-svn: 173417
* Reapply chandlerc's r173342 now that the miscompile it was triggering is fixed.Benjamin Kramer2013-01-241-0/+29
| | | | | | | | | | | | | | | | | | | | Original commit message: Plug TTI into the speculation logic, giving it a real cost interface that can be specialized by targets. The goal here is not to be more aggressive, but to just be more accurate with very obvious cases. There are instructions which are known to be truly free and which were not being modeled as such in this code -- see the regression test which is distilled from an inner loop of zlib. Everywhere the TTI cost model is insufficiently conservative I've added explicit checks with FIXME comments to go add proper modelling of these cost factors. If this causes regressions, the likely solution is to make TTI even more conservative in its cost estimates, but test cases will help here. llvm-svn: 173357
* Revert r173342 temporarily. It appears to cause a very late miscompileChandler Carruth2013-01-241-29/+0
| | | | | | of stage2 in a bootstrap. Still investigating.... llvm-svn: 173343
* Plug TTI into the speculation logic, giving it a real cost interfaceChandler Carruth2013-01-241-0/+29
| | | | | | | | | | | | | | | | | | that can be specialized by targets. The goal here is not to be more aggressive, but to just be more accurate with very obvious cases. There are instructions which are known to be truly free and which were not being modeled as such in this code -- see the regression test which is distilled from an inner loop of zlib. Everywhere the TTI cost model is insufficiently conservative I've added explicit checks with FIXME comments to go add proper modelling of these cost factors. If this causes regressions, the likely solution is to make TTI even more conservative in its cost estimates, but test cases will help here. llvm-svn: 173342
* Address a large chunk of this FIXME by accumulating the cost forChandler Carruth2013-01-241-0/+42
| | | | | | | unfolded constant expressions rather than checking each one independently. llvm-svn: 173341
* Switch the constant expression speculation cost evaluation away fromChandler Carruth2013-01-241-0/+22
| | | | | | | | | | | | | | | | | | | | a cost fuction that seems both a bit ad-hoc and also poorly suited to evaluating constant expressions. Notably, it is missing any support for trivial expressions such as 'inttoptr'. I could fix this routine, but it isn't clear to me all of the constraints its other users are operating under. The core protection that seems relevant here is avoiding the formation of a select instruction wich a further chain of select operations in a constant expression operand. Just explicitly encode that constraint. Also, update the comments and organization here to make it clear where this needs to go -- this should be driven off of real cost measurements which take into account the number of constants expressions and the depth of the constant expression tree. llvm-svn: 173340
* Revert r56315. When the instruction to speculate is a load, thisDan Gohman2012-01-051-1/+1
| | | | | | | | code can incorrectly move the load across a store. This never happens in practice today, but only because the current heuristics accidentally preclude it. llvm-svn: 147623
* Make some intrinsics safe to speculatively execute.Nick Lewycky2011-12-211-3/+28
| | | | llvm-svn: 147036
* fix a bunch of spurious failures for people whose home directoryChris Lattner2009-09-111-2/+2
| | | | | | is sabre. llvm-svn: 81528
* Use opt -S instead of piping bitcode output through llvm-dis.Dan Gohman2009-09-081-2/+2
| | | | llvm-svn: 81257
* Change these tests to feed the assembly files to opt directly, insteadDan Gohman2009-09-081-2/+2
| | | | | | of using llvm-as, now that opt supports this. llvm-svn: 81226
* Speculatively execute a block when the the block is the then part of a ↵Evan Cheng2008-06-071-0/+21
triangle shape and it contains a single, side effect free, cheap instruction. The branch is eliminated by adding a select instruction. i.e. Turn BB: %t1 = icmp br i1 %t1, label %BB1, label %BB2 BB1: %t3 = add %t2, c br label BB2 BB2: => BB: %t1 = icmp %t4 = add %t2, c %t3 = select i1 %t1, %t2, %t3 llvm-svn: 52073
OpenPOWER on IntegriCloud