summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Switch to select optimization for two-case switchesMarcello Maggioni2014-10-141-0/+170
| | | | | | | This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block and a test to check that this condition is handled. llvm-svn: 219656
* Revert r219223, it creates invalid PHI nodes.Joerg Sonnenberger2014-10-121-170/+0
| | | | llvm-svn: 219587
* SimplifyCFG: Don't convert phis into selects if we could remove undef behaviorArnold Schwaighofer2014-10-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead We used to transform this: define void @test6(i1 %cond, i8* %ptr) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br label %bb2 bb2: %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ] store i8 2, i8* %ptr.2, align 8 ret void } into this: define void @test6(i1 %cond, i8* %ptr) { %ptr.2 = select i1 %cond, i8* null, i8* %ptr store i8 2, i8* %ptr.2, align 8 ret void } because the simplifycfg transformation into selects would happen to happen before the simplifycfg transformation that removes unreachable control flow (We have 'unreachable control flow' due to the store to null which is undefined behavior). The existing transformation that removes unreachable control flow in simplifycfg is: /// If BB has an incoming value that will always trigger undefined behavior /// (eg. null pointer dereference), remove the branch leading here. static bool removeUndefIntroducingPredecessor(BasicBlock *BB) Now we generate: define void @test6(i1 %cond, i8* %ptr) { store i8 2, i8* %ptr.2, align 8 ret void } I did not see any impact on the test-suite + externals. rdar://18596215 llvm-svn: 219462
* Two case switch to select optimizationMarcello Maggioni2014-10-071-0/+170
| | | | | | | | | | | | | | | | | | | | | This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block to a select or a couple of selects (depending if the default block is reachable or not). The typical case this optimization wants to be able to optimize is this one: Example: switch (a) { case 10: %0 = icmp eq i32 %a, 10 return 10; %1 = select i1 %0, i32 10, i32 4 case 20: ----> %2 = icmp eq i32 %a, 20 return 2; %3 = select i1 %2, i32 2, i32 %1 default: return 4; } It also sets the base for further optimizations that are planned and being reviewed. llvm-svn: 219223
* [SimplifyCFG] threshold for folding branches with common destinationJingyue Wu2014-09-301-66/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int *input, int *output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i | c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 llvm-svn: 218711
* Remove dead code in SimplifyCFGJingyue Wu2014-09-151-43/+0
| | | | | | | | | | | | | | | | | | | | | | | Summary: UsedByBranch is always true according to how BonusInst is defined. Test Plan: Passes check-all, and also verified if (BonusInst && !UsedByBranch) { ... } is never entered during check-all. Reviewers: resistor, nadav, jingyue Reviewed By: jingyue Subscribers: llvm-commits, eliben, meheff Differential Revision: http://reviews.llvm.org/D5324 llvm-svn: 217824
* Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.)Hal Finkel2014-09-071-29/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change, which allows @llvm.assume to be used from within computeKnownBits (and other associated functions in ValueTracking), adds some (optional) parameters to computeKnownBits and friends. These functions now (optionally) take a "context" instruction pointer, an AssumptionTracker pointer, and also a DomTree pointer, and most of the changes are just to pass this new information when it is easily available from InstSimplify, InstCombine, etc. As explained below, the significant conceptual change is that known properties of a value might depend on the control-flow location of the use (because we care that the @llvm.assume dominates the use because assumptions have control-flow dependencies). This means that, when we ask if bits are known in a value, we might get different answers for different uses. The significant changes are all in ValueTracking. Two main changes: First, as with the rest of the code, new parameters need to be passed around. To make this easier, I grouped them into a structure, and I made internal static versions of the relevant functions that take this structure as a parameter. The new code does as you might expect, it looks for @llvm.assume calls that make use of the value we're trying to learn something about (often indirectly), attempts to pattern match that expression, and uses the result if successful. By making use of the AssumptionTracker, the process of finding @llvm.assume calls is not expensive. Part of the structure being passed around inside ValueTracking is a set of already-considered @llvm.assume calls. This is to prevent a query using, for example, the assume(a == b), to recurse on itself. The context and DT params are used to find applicable assumptions. An assumption needs to dominate the context instruction, or come after it deterministically. In this latter case we only handle the specific case where both the assumption and the context instruction are in the same block, and we need to exclude assumptions from being used to simplify their own ephemeral values (those which contribute only to the assumption) because otherwise the assumption would prove its feeding comparison trivial and would be removed. This commit adds the plumbing and the logic for a simple masked-bit propagation (just enough to write a regression test). Future commits add more patterns (and, correspondingly, more regression tests). llvm-svn: 217342
* Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid ↵Craig Topper2014-08-211-1/+1
| | | | | | needing to mention the size. llvm-svn: 216158
* Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to ↵Craig Topper2014-08-181-1/+1
| | | | | | | | avoid needing to mention the size." Getting a weird buildbot failure that I need to investigate. llvm-svn: 215870
* Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid ↵Craig Topper2014-08-171-1/+1
| | | | | | needing to mention the size. llvm-svn: 215868
* Introduce a helper to combine instruction metadata.Rafael Espindola2014-08-151-0/+8
| | | | | | | | | Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use it. Patch by Björn Steinbrink! llvm-svn: 215723
* [SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion.Manman Ren2014-08-021-1/+4
| | | | | | | | | When we have a covered lookup table, make sure we don't delete PHINodes that are cached in PHIs. rdar://17887153 llvm-svn: 214642
* SimplifyCFG: Avoid miscompilations due to removed lifetime intrinsics.Rafael Espindola2014-07-301-1/+1
| | | | | | | | | | | The lifetime intrinsics need some work in order to make it clear which optimizations are or are not valid. For now dropping this optimization avoids a miscompilation. Patch by Björn Steinbrink. llvm-svn: 214336
* Feedback from Hans on r213815. No functionaility change.Manman Ren2014-07-241-10/+11
| | | | llvm-svn: 213895
* Fixing an MSVC conversion warning about implicitly converting the shift ↵Aaron Ballman2014-07-241-1/+1
| | | | | | results to 64-bits. No functional change intended. llvm-svn: 213863
* SimplifyCFG: fix a bug in switch to table conversionManman Ren2014-07-231-4/+13
| | | | | | | | | | | | | | | | | | | We use gep to access the global array "switch.table", and the table index should be treated as unsigned. When the highest bit is 1, this commit zero-extends the index to an integer type with larger size. For a switch on i2, we used to generate: %switch.tableidx = sub i2 %0, -2 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate %switch.tableidx = sub i2 %0, -2 %switch.tableidx.zext = zext i2 %switch.tableidx to i3 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext rdar://17735071 llvm-svn: 213815
* Revert "[C++11] Add predecessors(BasicBlock *) / successors(BasicBlock *) ↵Duncan P. N. Exon Smith2014-07-211-23/+25
| | | | | | | | | iterator ranges." This reverts commit r213474 (and r213475), which causes a miscompile on a stage2 LTO build. I'll reply on the list in a moment. llvm-svn: 213562
* [C++11] Add predecessors(BasicBlock *) / successors(BasicBlock *) iterator ↵Manuel Jacob2014-07-201-25/+23
| | | | | | | | | | | | | | | | | | ranges. Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended. Test Plan: All tests (make check-all) still pass. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4481 llvm-svn: 213474
* Feeding isSafeToSpeculativelyExecute its DataLayout pointerHal Finkel2014-07-101-25/+27
| | | | | | | | | | | | | | isSafeToSpeculativelyExecute can optionally take a DataLayout pointer. In the past, this was mainly used to make better decisions regarding divisions known not to trap, and so was not all that important for users concerned with "cheap" instructions. However, now it also helps look through bitcasts for dereferencable loads, and will also be important if/when we add a dereferencable pointer attribute. This is some initial work to feed a DataLayout pointer through to callers of isSafeToSpeculativelyExecute, generally where one was already available. llvm-svn: 212720
* Fix for PR17073 ( http://llvm.org/pr17073 ), simplifycfg illegally hoists an ↵Sanjay Patel2014-07-071-3/+20
| | | | | | | | | | operation in a phi node that can trap. This patch adds to an existing loop over phi nodes in SimplifyCondBranchToCondBranch() to check for trapping ops and bails out of the optimization if we find one of those. The test cases verify that trapping ops are not hoisted and non-trapping ops are still optimized as expected. llvm-svn: 212490
* fixed some typos in commentsSanjay Patel2014-07-061-4/+4
| | | | llvm-svn: 212423
* Minor stylistic fix in SimplifyCFG (test commit)Marcello Maggioni2014-07-031-1/+2
| | | | llvm-svn: 212259
* Don't build switch tables for dllimport and TLS variables in GEPsHans Wennborg2014-06-261-2/+3
| | | | | | | This is a follow-up to r211331, which failed to notice that we were returning early from ValidLookupTableConstant for GEPs. llvm-svn: 211753
* Don't build switch lookup tables for dllimport or TLS variablesHans Wennborg2014-06-201-0/+4
| | | | | | | | | | | | | We would previously put dllimport variables in switch lookup tables, which doesn't work because the address cannot be used in a constant initializer. This is basically the same problem that we have in PR19955. Putting TLS variables in switch tables also desn't work, because the address of such a variable is not constant. Differential Revision: http://reviews.llvm.org/D4220 llvm-svn: 211331
* Make bitcast, extractelement, and insertelement considered cheap for ↵Matt Arsenault2014-05-301-0/+3
| | | | | | | | | | speculation. This helps more branches into selects. On R600, vectors are cheap and anything that helps remove branches is very good. llvm-svn: 209914
* Rename ComputeMaskedBits to computeKnownBits. "Masked" has beenJay Foad2014-05-141-1/+1
| | | | | | inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811
* Add ExtractValue instruction to SimplifyCFG's ComputeSpeculationCostLouis Gerbarg2014-05-091-0/+1
| | | | | | | | | | | | | Since ExtractValue is not included in ComputeSpeculationCost CFGs containing ExtractValueInsts cannot be simplified. In particular this interacts with InstCombineCompare's tendency to insert add.with.overflow intrinsics for certain idiomatic math operations, preventing optimization. This patch adds ExtractValue to the ComputeSpeculationCost. Test case included rdar://14853450 llvm-svn: 208434
* [C++] Use 'nullptr'. Transforms edition.Craig Topper2014-04-251-73/+74
| | | | llvm-svn: 207196
* [Modules] Fix potential ODR violations by sinking the DEBUG_TYPEChandler Carruth2014-04-221-1/+2
| | | | | | | | | | | | | | | | | definition below all of the header #include lines, lib/Transforms/... edition. This one is tricky for two reasons. We again have a couple of passes that define something else before the includes as well. I've sunk their name macros with the DEBUG_TYPE. Also, InstCombine contains headers that need DEBUG_TYPE, so now those headers #define and #undef DEBUG_TYPE around their code, leaving them well formed modular headers. Fixing these headers was a large motivation for all of these changes, as "leaky" macros of this form are hard on the modules implementation. llvm-svn: 206844
* Allow switch-to-lookup table for tables with holes by adding bitmask checkHans Wennborg2014-03-121-5/+61
| | | | | | | | | | | | | | | | | | | | | | | | This allows us to generate table lookups for code such as: unsigned test(unsigned x) { switch (x) { case 100: return 0; case 101: return 1; case 103: return 2; case 105: return 3; case 107: return 4; case 109: return 5; case 110: return 6; default: return f(x); } } Since cases 102, 104, etc. are not constants, the lookup table has holes in those positions. We therefore guard the table lookup with a bitmask check. Patch by Jasper Neumann! llvm-svn: 203694
* SimplifyCFG: Simplify the weight scaling algorithm.Benjamin Kramer2014-03-091-15/+6
| | | | | | No change in functionality. llvm-svn: 203413
* [C++11] Add range based accessors for the Use-Def chain of a Value.Chandler Carruth2014-03-091-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This requires a number of steps. 1) Move value_use_iterator into the Value class as an implementation detail 2) Change it to actually be a *Use* iterator rather than a *User* iterator. 3) Add an adaptor which is a User iterator that always looks through the Use to the User. 4) Wrap these in Value::use_iterator and Value::user_iterator typedefs. 5) Add the range adaptors as Value::uses() and Value::users(). 6) Update *all* of the callers to correctly distinguish between whether they wanted a use_iterator (and to explicitly dig out the User when needed), or a user_iterator which makes the Use itself totally opaque. Because #6 requires churning essentially everything that walked the Use-Def chains, I went ahead and added all of the range adaptors and switched them to range-based loops where appropriate. Also because the renaming requires at least churning every line of code, it didn't make any sense to split these up into multiple commits -- all of which would touch all of the same lies of code. The result is still not quite optimal. The Value::use_iterator is a nice regular iterator, but Value::user_iterator is an iterator over User*s rather than over the User objects themselves. As a consequence, it fits a bit awkwardly into the range-based world and it has the weird extra-dereferencing 'operator->' that so many of our iterators have. I think this could be fixed by providing something which transforms a range of T&s into a range of T*s, but that *can* be separated into another patch, and it isn't yet 100% clear whether this is the right move. However, this change gets us most of the benefit and cleans up a substantial amount of code around Use and User. =] llvm-svn: 203364
* [Modules] Move the ConstantRange class into the IR library. This isChandler Carruth2014-03-041-1/+1
| | | | | | | | | | a bit surprising, as the class is almost entirely abstracted away from any particular IR, however it encodes the comparsion predicates which mutate ranges as ICmp predicate codes. This is reasonable as they're used for both instructions and constants. Thus, it belongs in the IR library with instructions and constants. llvm-svn: 202838
* [Modules] Move the NoFolder into the IR library as it createsChandler Carruth2014-03-041-1/+1
| | | | | | instructions. llvm-svn: 202834
* [Modules] Move CFG.h to the IR library as it defines graph traits overChandler Carruth2014-03-041-1/+1
| | | | | | IR types. llvm-svn: 202827
* [Modules] Move the LLVM IR pattern match header into the IR library, itChandler Carruth2014-03-041-1/+1
| | | | | | obviously is coupled to the IR. llvm-svn: 202818
* [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.Benjamin Kramer2014-03-021-2/+2
| | | | | | Remove the old functions. llvm-svn: 202636
* Rename many DataLayout variables from TD to DL.Rafael Espindola2014-02-211-70/+70
| | | | | | | | | I am really sorry for the noise, but the current state where some parts of the code use TD (from the old name: TargetData) and other parts use DL makes it hard to write a patch that changes where those variables come from and how they are passed along. llvm-svn: 201827
* Fix pr14893.Rafael Espindola2014-01-281-0/+8
| | | | | | | | | | | When simplifycfg moves an instruction, it must drop metadata it doesn't know is still valid with the preconditions changes. In particular, it must drop the range and tbaa metadata. The patch implements this with an utility function to drop all metadata not in a white list. llvm-svn: 200322
* PGO branch weight: keep halving the weights until they can fit intoManman Ren2014-01-271-12/+13
| | | | | | | | | | uint32. When folding branches to common destination, the updated branch weights can exceed uint32 by more than factor of 2. We should keep halving the weights until they can fit into uint32. llvm-svn: 200262
* Fix known typosAlp Toker2014-01-241-3/+3
| | | | | | | Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018
* Switch-to-lookup tables: set threshold to 3 casesHans Wennborg2014-01-151-5/+3
| | | | | | | | | | | | | | | There has been an old FIXME to find the right cut-off for when it's worth analyzing and potentially transforming a switch to a lookup table. The switches always have two or more cases. I could not measure any speed-up by transforming a switch with two cases. A switch with three cases gets a nice speed-up, and I couldn't measure any compile-time regression, so I think this is the right threshold. In a Clang self-host, this causes 480 new switches to be transformed, and reduces the final binary size with 8 KB. llvm-svn: 199294
* Switch-to-lookup tables: Don't require a result for the defaultHans Wennborg2014-01-121-12/+25
| | | | | | | | | | | | | | | | | | | | | | | case when the lookup table doesn't have any holes. This means we can build a lookup table for switches like this: switch (x) { case 0: return 1; case 1: return 2; case 2: return 3; case 3: return 4; default: exit(1); } The default case doesn't yield a constant result here, but that doesn't matter, since a default result is only necessary for filling holes in the lookup table, and this table doesn't have any holes. This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB off the resulting clang binary. llvm-svn: 199025
* Transforms: Don't create bad weights when eliminating dead casesJustin Bogner2013-12-201-1/+1
| | | | | | | | | If we happen to eliminate every case in a switch that has branch weights, we currently try to create metadata for the one remaining branch, triggering an assert. Instead, we need to check that the metadata we're trying to create is sensible. llvm-svn: 197791
* FoldBranchToCommonDest merges branches into a single branch with or/and of ↵Nadav Rotem2013-11-121-2/+7
| | | | | | the condition. It has a heuristics for estimating when some of the dependencies are processed by out-of-order processors. This patch adds another rule to the heuristics that says that if the "BonusInstruction" that we speculatively execute is used by the condition of the second branch then it is okay to hoist it. This change exposes more opportunities for other passes to transform the code. It does not matter that much that we if-convert the code because the selectiondag builder splits or/and branches into multiple branches when profitable. llvm-svn: 194524
* SimplifyCFG: Use existing constant folding logic when forming switch tables.Benjamin Kramer2013-11-121-31/+20
| | | | | | Both simpler and more powerful than the hand-rolled folding logic. llvm-svn: 194475
* SimplifyCFG has a heuristics for out-of-order processors that decides when ↵Nadav Rotem2013-11-101-1/+1
| | | | | | | | it is worthwhile to merge branches. It tries to estimate if the operands of the instruction that we want to hoist are ready. This commit marks function arguments as 'ready' because they require no calculation. This boosts libquantum and a few other workloads from the testsuite. llvm-svn: 194346
* SimplifyCFG: Don't duplicate calls to functions marked noduplicate v2Tom Stellard2013-10-211-0/+15
| | | | | | | v2: - Use CI->cannotDuplicate() llvm-svn: 193115
* Teach SimplifyCFG about address spacesMatt Arsenault2013-10-211-5/+9
| | | | llvm-svn: 193104
* Fix the predecessor removal logic in r193045.Michael Gottesman2013-10-211-11/+9
| | | | | | Additionally some small comment/stylistic fixes are included as well. llvm-svn: 193068
OpenPOWER on IntegriCloud