summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Switch the select to branch transformation on by default.Benjamin Kramer2012-05-061-3/+4
| | | | | | | | | The primitive conservative heuristic seems to give a slight overall improvement while not regressing stuff. Make it available to wider testing. If you notice any speed regressions (or significant code size regressions) let me know! llvm-svn: 156258
* Remove trailing spaces.Jakub Staszak2012-05-061-60/+60
| | | | llvm-svn: 156257
* CodeGenPrepare: Add a transform to turn selects into branches in some cases.Benjamin Kramer2012-05-051-0/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This came up when a change in block placement formed a cmov and slowed down a hot loop by 50%: ucomisd (%rdi), %xmm0 cmovbel %edx, %esi cmov is a really bad choice in this context because it doesn't get branch prediction. If we emit it as a branch, an out-of-order CPU can do a better job (if the branch is predicted right) and avoid waiting for the slow load+compare instruction to finish. Of course it won't help if the branch is unpredictable, but those are really rare in practice. This patch uses a dumb conservative heuristic, it turns all cmovs that have one use and a direct memory operand into branches. cmovs usually save some code size, so we disable the transform in -Os mode. In-Order architectures are unlikely to benefit as well, those are included in the "predictableSelectIsExpensive" flag. It would be better to reuse branch probability info here, but BPI doesn't support select instructions currently. It would make sense to use the same heuristics as the if-converter pass, which does the opposite direction of this transform. Test suite shows a small improvement here and there on corei7-level machines, but the actual results depend a lot on the used microarchitecture. The transformation is currently disabled by default and available by passing the -enable-cgp-select2branch flag to the code generator. Thanks to Chandler for the initial test case to him and Evan Cheng for providing me with comments and test-suite numbers that were more stable than mine :) llvm-svn: 156234
* Small fix in InstCombineCasts.cpp. Restored "alloca + bitcast" reducing for ↵Stepan Dyatkovskiy2012-05-051-1/+1
| | | | | | | | case when alloca's size is calculated within the "add/sub/... nsw". Also added fix to 2011-06-13-nsw-alloca.ll test. llvm-svn: 156231
* Teach the code extractor how to extract a sequence of blocks fromChandler Carruth2012-05-041-7/+32
| | | | | | | RegionInfo's RegionNode. This mirrors the logic for automating the extraction from a Loop. llvm-svn: 156208
* Factor the computation of input and output sets into a public interfaceChandler Carruth2012-05-041-35/+34
| | | | | | | | | | | | | | | | of the CodeExtractor utility. This allows speculatively computing input and output sets to measure the likely size impact of the code extraction. These sets cannot be reused sadly -- we mutate the function prior to forming the final sets used by the actual extraction. The interface has been revamped slightly to make it easier to use correctly by making the interface const and sinking the computation of the number of exit blocks into the full extraction function and away from the rest of this logic which just computed two output parameters. llvm-svn: 156168
* Rather than trying to gracefully handle input sequences with repeatedChandler Carruth2012-05-041-1/+1
| | | | | | | blocks, assert that this doesn't happen. We don't want to bother trying to support this call pattern as it isn't necessary. llvm-svn: 156167
* Fix a goof with my previous commit by completely returning when weChandler Carruth2012-05-041-1/+1
| | | | | | detect an in-eligible block rather than just breaking out of the loop. llvm-svn: 156166
* Hoist a safety assert from the extraction method into the constructionChandler Carruth2012-05-041-9/+13
| | | | | | of the extractor itself. llvm-svn: 156164
* Move the CodeExtractor utility to a dedicated header file / source file,Chandler Carruth2012-05-043-166/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | and expose it as a utility class rather than as free function wrappers. The simple free-function interface works well for the bugpoint-specific pass's uses of code extraction, but in an upcoming patch for more advanced code extraction, they simply don't expose a rich enough interface. I need to expose various stages of the process of doing the code extraction and query information to decide whether or not to actually complete the extraction or give up. Rather than build up a new predicate model and pass that into these functions, just take the class that was actually implementing the functions and lift it up into a proper interface that can be used to perform code extraction. The interface is cleaned up and re-documented to work better in a header. It also is now setup to accept the blocks to be extracted in the constructor rather than in a method. In passing this essentially reverts my previous commit here exposing a block-level query for eligibility of extraction. That is no longer necessary with the more rich interface as clients can query the extraction object for eligibility directly. This will reduce the number of walks of the input basic block sequence by quite a bit which is useful if this enters the normal optimization pipeline. llvm-svn: 156163
* Add 'landingpad' instructions to the list of instructions to ignore.Bill Wendling2012-05-041-7/+9
| | | | | | Also combine the code in the 'assert' statement. llvm-svn: 156155
* A pile of long over-due refactorings here. There are some very, *very*Chandler Carruth2012-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | minor behavior changes with this, but nothing I have seen evidence of in the wild or expect to be meaningful. The real goal is unifying our logic and simplifying the interfaces. A summary of the changes follows: - Make 'callIsSmall' actually accept a callsite so it can handle intrinsics, and simplify callers appropriately. - Nuke a completely bogus declaration of 'callIsSmall' that was still lurking in InlineCost.h... No idea how this got missed. - Teach the 'isInstructionFree' about the various more intelligent 'free' heuristics that got added to the inline cost analysis during review and testing. This mostly surrounds int->ptr and ptr->int casts. - Switch most of the interesting parts of the inline cost analysis that were essentially computing 'is this instruction free?' to use the code metrics routine instead. This way we won't keep duplicating logic. All of this is motivated by the desire to allow other passes to compute a roughly equivalent 'cost' metric for a particular basic block as the inline cost analysis. Sadly, re-using the same analysis for both is really messy because only the actual inline cost analysis is ever going to go to the contortions required for simplification, SROA analysis, etc. llvm-svn: 156140
* Factor the logic for testing whether a basic block is viable for codeChandler Carruth2012-05-031-14/+21
| | | | | | | | | | | extraction into a public interface. Also clean it up and apply it more consistently such that we check for landing pads *anywhere* in the extracted code, not just in single-block extraction. This will be used to guide decisions in passes that are planning to eventually perform a round of code extraction. llvm-svn: 156114
* remove calls to calloc if the allocated memory is not used (it was already ↵Nuno Lopes2012-05-031-1/+1
| | | | | | | | being done for malloc) fix a few typos found by Chad in my previous commit llvm-svn: 156110
* add support for calloc to objectsize loweringNuno Lopes2012-05-031-5/+17
| | | | llvm-svn: 156102
* replace 'break's with 'return 0' in visitCallInst code for objectsize, since ↵Nuno Lopes2012-05-031-5/+5
| | | | | | | | there is no need to fallback to visitCallSite. This gives a 0.9% in a test case llvm-svn: 156069
* Whitespace cleanup.Bill Wendling2012-05-021-87/+80
| | | | llvm-svn: 156034
* [tsan] typo and style (thanks to Nick Lewycky)Kostya Serebryany2012-05-021-9/+9
| | | | llvm-svn: 155986
* The value held in the vector may be RAUW'ed by some of the canonicalizationBill Wendling2012-05-021-2/+3
| | | | | | | methods. Use a weak value handle to keep up with this. PR12245 llvm-svn: 155984
* An instruction in a loop is not guaranteed to be executed just because the loopNick Lewycky2012-05-011-0/+5
| | | | | | has no exit blocks. Fixes PR12706! llvm-svn: 155884
* Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. FixesLang Hames2012-05-011-0/+51
| | | | | | | | | <rdar://problem/11291436>. This is a second attempt at a fix for this, the first was r155468. Thanks to Chandler, Bob and others for the feedback that helped me improve this. llvm-svn: 155866
* Second attempt at PR12573:Bill Wendling2012-04-302-13/+28
| | | | | | | | | | | Allow the "SplitCriticalEdge" function to split the edge to a landing pad. If the pass is *sure* that it thinks it knows what it's doing, then it may go ahead and specify that the landing pad can have its critical edge split. The loop unswitch pass is one of these passes. It will split the critical edges of all edges coming from a loop to a landing pad not within the loop. Doing so will retain important loop analysis information, such as loop simplify. llvm-svn: 155817
* Use an ArrayRef instead of explicit vector type.Bill Wendling2012-04-301-8/+5
| | | | llvm-svn: 155816
* Remove hack from r154987. The problem persists even with it, so it's not ↵Bill Wendling2012-04-301-11/+1
| | | | | | even a good hack. llvm-svn: 155813
* Make sure HoistInsertPosition finds a position that is dominated by allRafael Espindola2012-04-301-1/+1
| | | | | | inputs. llvm-svn: 155809
* Don't vectorize target-specific types (ppc_fp128, x86_fp80, etc.).Hal Finkel2012-04-271-0/+6
| | | | | | | | | | Target specific types should not be vectorized. As a practical matter, these types are already register matched (at least in the x86 case), and codegen does not always work correctly (at least in the ppc case, and this is not worth fixing because ppc_fp128 is currently broken and will probably go away soon). llvm-svn: 155729
* Change recurse depth limit to uint32 to fix warning.David Blaikie2012-04-271-1/+1
| | | | llvm-svn: 155727
* Miscellaneous accumulated cleanups.Dan Gohman2012-04-271-71/+57
| | | | llvm-svn: 155725
* Add an early bailout to IsValueFullyAvailableInBlock from deeply nested blocks.Mon P Wang2012-04-271-3/+12
| | | | | | | The limit is set to an arbitrary 1000 recursion depth to avoid stack overflow issues. <rdar://problem/11286839>. llvm-svn: 155722
* [asan] small optimization: do not emit "x+0" instructions Kostya Serebryany2012-04-271-3/+4
| | | | llvm-svn: 155701
* [tsan] Atomic support for ThreadSanitizer, patch by Dmitry VyukovKostya Serebryany2012-04-271-33/+152
| | | | llvm-svn: 155698
* Break up getProfitableChainIncrement().Jakob Stoklund Olesen2012-04-261-39/+47
| | | | | | | | | | | The required checks are moved to ChainInstruction() itself and the policy decisions are moved to IVChain::isProfitableInc(). Also cache the ExprBase in IVChain to avoid frequent recomputations. No functional change intended. llvm-svn: 155676
* Turn IVChain into a struct.Jakob Stoklund Olesen2012-04-261-19/+42
| | | | | | No functional change intended. llvm-svn: 155675
* Add instcombine patterns for the following transformations:Chad Rosier2012-04-262-0/+19
| | | | | | | | | | (x & y) | (x ^ y) -> x | y (x & y) + (x ^ y) -> x | y Patch by Manman Ren. rdar://10770603 llvm-svn: 155674
* Teach the reassociate pass to fold chains of multiplies with repeatedChandler Carruth2012-04-261-10/+247
| | | | | | | | | | | | | | | | | elements to minimize the number of multiplies required to compute the final result. This uses a heuristic to attempt to form near-optimal binary exponentiation-style multiply chains. While there are some cases it misses, it seems to at least a decent job on a very diverse range of inputs. Initial benchmarks show no interesting regressions, and an 8% improvement on SPASS. Let me know if any other interesting results (in either direction) crop up! Credit to Richard Smith for the core algorithm, and helping code the patch itself. llvm-svn: 155616
* Print IV chain numbers while collecting them.Jakob Stoklund Olesen2012-04-251-4/+5
| | | | llvm-svn: 155567
* Reverting r155468. Chris and Chandler have convinced me that it's dangerous andLang Hames2012-04-251-35/+0
| | | | | | | | in poor taste. Talking through some alternate solutions with Chandler. llvm-svn: 155530
* Simplify the known retain count tracking; use a boolean state insteadDan Gohman2012-04-251-41/+34
| | | | | | | of a precise count. Also, move RRInfo's Partial field into PtrState, now that it won't increase the size. llvm-svn: 155513
* Build custom predecessor and successor lists for each basic block.Dan Gohman2012-04-241-115/+101
| | | | | | | | These lists exclude invoke unwind edges and loop backedges which are being ignored. This makes it easier to ignore them consistently. llvm-svn: 155500
* Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. This fixesLang Hames2012-04-241-0/+35
| | | | | | <rdar://problem/11291436>. llvm-svn: 155468
* Reapply r155136 after fixing PR12599.Jakob Stoklund Olesen2012-04-231-39/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original commit message: Defer some shl transforms to DAGCombine. The shl instruction is used to represent multiplication by a constant power of two as well as bitwise left shifts. Some InstCombine transformations would turn an shl instruction into a bit mask operation, making it difficult for later analysis passes to recognize the constsnt multiplication. Disable those shl transformations, deferring them to DAGCombine time. An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'. These transformations are deferred: (X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses) (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1) (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2) The corresponding exact transformations are preserved, just like div-exact + mul: (X >>?,exact C) << C --> X (X >>?,exact C1) << C2 --> X << (C2-C1) (X >>?,exact C1) << C2 --> X >>?,exact (C1-C2) The disabled transformations could also prevent the instruction selector from recognizing rotate patterns in hash functions and cryptographic primitives. I have a test case for that, but it is too fragile. llvm-svn: 155362
* Fix issue 67 by checking that the interface functions weren't redefined in ↵Alexander Potapenko2012-04-231-4/+18
| | | | | | the compiled source file. llvm-svn: 155346
* [tsan] use llvm/ADT/Statistic.h for tsan statsKostya Serebryany2012-04-231-40/+17
| | | | llvm-svn: 155341
* Revert r155136 "Defer some shl transforms to DAGCombine."Jakob Stoklund Olesen2012-04-201-35/+39
| | | | | | | | | While the patch was perfect and defect free, it exposed a really nasty bug in X86 SelectionDAG that caused an llc crash when compiling lencod. I'll put the patch back in after fixing the SelectionDAG problem. llvm-svn: 155181
* Put this expensive check below the less expensive ones.Bill Wendling2012-04-191-9/+9
| | | | llvm-svn: 155166
* Avoid a bug in the path count computation, preventing an infiniteDan Gohman2012-04-191-1/+1
| | | | | | loop repeatedlt making the same change. This is for rdar://11256239. llvm-svn: 155160
* Defer some shl transforms to DAGCombine.Jakob Stoklund Olesen2012-04-191-39/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The shl instruction is used to represent multiplication by a constant power of two as well as bitwise left shifts. Some InstCombine transformations would turn an shl instruction into a bit mask operation, making it difficult for later analysis passes to recognize the constsnt multiplication. Disable those shl transformations, deferring them to DAGCombine time. An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'. These transformations are deferred: (X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses) (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1) (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2) The corresponding exact transformations are preserved, just like div-exact + mul: (X >>?,exact C) << C --> X (X >>?,exact C1) << C2 --> X << (C2-C1) (X >>?,exact C1) << C2 --> X >>?,exact (C1-C2) The disabled transformations could also prevent the instruction selector from recognizing rotate patterns in hash functions and cryptographic primitives. I have a test case for that, but it is too fragile. llvm-svn: 155136
* Don't crash on code where the user put __attribute__((constructor)) onDan Gohman2012-04-181-1/+5
| | | | | | a function with arguments. This fixes rdar://11265785. llvm-svn: 155073
* Use a heavy hammer to fix PR12573.Bill Wendling2012-04-181-0/+9
| | | | | | | | | | | If the loop contains invoke instructions, whose unwind edge escapes the loop, then don't try to unswitch the loop. Doing so may cause the unwind edge to be split, which not only is non-trivial but doesn't preserve loop simplify information. Fixes PR12573 llvm-svn: 154987
* loop-reduce: Add an early bailout to catch extremely large loops.Andrew Trick2012-04-181-0/+17
| | | | | | | | | | | | | | This introduces a threshold of 200 IV Users, which is very conservative but should be sufficient to avoid serious compile time sink or stack overflow. The llvm test-suite with LTO never exceeds 190 users per loop. The bug doesn't relate to a specific type of loop. Checking in an arbitrary giant loop as a unit test would be silly. Fixes rdar://11262507. llvm-svn: 154983
OpenPOWER on IntegriCloud