summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Make sure that value handle users see the transformation of an indirect call ↵Nick Lewycky2014-02-201-0/+2
| | | | | | to a direct call. This is important for the CallGraph iteration. Patch by Björn Steinbrink! llvm-svn: 201822
* Add back r201608, r201622, r201624 and r201625Rafael Espindola2014-02-191-11/+5
| | | | | | | | | | | | | | r201608 made llvm corretly handle private globals with MachO. r201622 fixed a bug in it and r201624 and r201625 were changes for using private linkage, assuming that llvm would do the right thing. They all got reverted because r201608 introduced a crash in LTO. This patch includes a fix for that. The issue was that TargetLoweringObjectFile now has to be initialized before we can mangle names of private globals. This is trivially true during the normal codegen pipeline (the asm printer does it), but LTO has to do it manually. llvm-svn: 201700
* This reverts commit r201625 and r201624.Rafael Espindola2014-02-191-5/+11
| | | | | | | Since r201608 got reverted, it is not safe to use private linkage in these cases until it is committed back. llvm-svn: 201688
* X86 CodeGenPrep: sink shufflevectors before shiftsTim Northover2014-02-191-0/+72
| | | | | | | | | | | | | | | | | On x86, shifting a vector by a scalar is significantly cheaper than shifting a vector by another fully general vector. Unfortunately, because SelectionDAG operates on just one basic block at a time, the shufflevector instruction that reveals whether the right-hand side of a shift *is* really a scalar is often not visible to CodeGen when it's needed. This adds another handler to CodeGenPrepare, to sink any useful shufflevector instructions down to the basic block where they're used, predicated on a target hook (since on other architectures, doing so will often just introduce extra real work). rdar://problem/16063505 llvm-svn: 201655
* Now that llvm always does the right thing with private, use it.Rafael Espindola2014-02-191-11/+5
| | | | llvm-svn: 201625
* Rename some member variables from TD to DL.Rafael Espindola2014-02-181-9/+9
| | | | | | TargetData was renamed DataLayout back in r165242. llvm-svn: 201581
* GlobalMerge: move "-global-merge" option to the pass itself.Tim Northover2014-02-181-0/+8
| | | | | | | It's rather odd to have the flag enabling and disabling this pass only affect a single target. llvm-svn: 201559
* fix for null VectorizedValue assertion in the SLP Vectorizer (in function ↵Gerolf Hoflehner2014-02-171-2/+4
| | | | | | vectorizeTree()). radar://16064178 llvm-svn: 201501
* fixed typo in comment as my test commitGerolf Hoflehner2014-02-161-1/+1
| | | | llvm-svn: 201486
* [CodeGenPrepare][AddressingModeMatcher] Give up on type promotion if theQuentin Colombet2014-02-141-3/+33
| | | | | | | transformation does not bring any immediate benefits and introduce an illegal operation. llvm-svn: 201439
* Trivial cleanup: reuse existing variable.Rafael Espindola2014-02-141-2/+1
| | | | | | | | Extracted while trying to understand http://llvm-reviews.chandlerc.com/D1764. Patch by Matt Arsenault. llvm-svn: 201425
* Do more addrspacecast transforms that happen for bitcast.Matt Arsenault2014-02-141-6/+12
| | | | | | Makes addrspacecast (gep) do addrspacecast (gep) instead. llvm-svn: 201376
* InstCombine: Replace custom constant folding code with ConstantExpr.Benjamin Kramer2014-02-131-26/+11
| | | | llvm-svn: 201352
* Reduce code duplication resulting from the ConstantVector/ConstantDataVector ↵Benjamin Kramer2014-02-132-22/+9
| | | | | | | | split. No intended functionality change. llvm-svn: 201344
* GlobalOpt: Aliases don't have sections, don't copy them when replacingReid Kleckner2014-02-131-1/+2
| | | | | | | | | | | | | | | | | | | | | As defined in LangRef, aliases do not have sections. However, LLVM's GlobalAlias class inherits from GlobalValue, which means we can read and set its section. We should probably ban that as a separate change, since it doesn't make much sense for an alias to have a section that differs from its aliasee. Fixes PR18757, where the section was being lost on the global in code from Clang like: extern "C" { __attribute__((used, section("CUSTOM"))) static int in_custom_section; } Reviewers: rafael.espindola Differential Revision: http://llvm-reviews.chandlerc.com/D2758 llvm-svn: 201286
* Remove a very old instcombine where we would turn sequences of selects intoOwen Anderson2014-02-121-25/+0
| | | | | | | | | | | | | logical operations on the i1's driving them. This is a bad idea for every target I can think of (confirmed with micro tests on all of: x86-64, ARM, AArch64, Mips, and PowerPC) because it forces the i1 to be materialized into a general purpose register, whereas consuming it directly into a select generally allows it to exist only transiently in a predicate or flags register. Chandler ran a set of performance tests with this change, and reported no measurable change on x86-64. llvm-svn: 201275
* [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo calledAndrea Di Biagio2014-02-123-7/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | 'OK_NonUniformConstValue' to identify operands which are constants but not constant splats. The cost model now allows returning 'OK_NonUniformConstValue' for non splat operands that are instances of ConstantVector or ConstantDataVector. With this change, targets are now able to compute different costs for instructions with non-uniform constant operands. For example, On X86 the cost of a vector shift may vary depending on whether the second operand is a uniform or non-uniform constant. This patch applies the following changes: - The cost model computation now takes into account non-uniform constants; - The cost of vector shift instructions has been improved in X86TargetTransformInfo analysis pass; - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish between non-uniform and uniform constant operands. Added a new test to verify that the output of opt '-cost-model -analyze' is valid in the following configurations: SSE2, SSE4.1, AVX, AVX2. llvm-svn: 201272
* InstCombine: Teach icmp merging about the equivalence of bit tests and ↵Benjamin Kramer2014-02-111-23/+38
| | | | | | | | | UGE/ULT with a power of 2. This happens in bitfield code. While there reorganize the existing code a bit. llvm-svn: 201176
* [LPM] Switch LICM to actively use LCSSA in addition to preserving it.Chandler Carruth2014-02-111-152/+90
| | | | | | | | | | | | | | | | | | | | | | | Fixes PR18753 and PR18782. This is necessary for LICM to preserve LCSSA correctly and efficiently. There is still some active discussion about whether we should be using LCSSA, but we can't just immediately stop using it and we *need* LICM to preserve it while we are using it. We can restore the old SSAUpdater driven code if and when there is a serious effort to remove the reliance on LCSSA from all of the loop passes. However, this also serves as a great example of why LCSSA is very nice to have. This change significantly simplifies the process of sinking instructions for LICM, and makes it quite a bit less expensive. It wouldn't even be as complex as it is except that I had to start the process of removing the big recursive LCSSA formation hammer in order to switch even this much of the re-forming code to asserting that LCSSA was preserved. I'll fully remove that next just to tidy things up until the LCSSA debate settles one way or the other. llvm-svn: 201148
* [CodeGenPrepare] Undo changes that happened for the profitability check.Quentin Colombet2014-02-111-0/+7
| | | | | | | | | | | | | | | | | The addressing mode matcher checks at some point the profitability of folding an instruction into the addressing mode. When the instruction to be folded has several uses, it checks that the instruction can be folded in each use. To do so, it creates a new matcher for each use and check if the instruction is in the list of the matched instructions of this new matcher. The new matchers may promote some instructions and this has to be undone to keep the state of the original matcher consistent. A test case will follow. <rdar://problem/16020230> llvm-svn: 201121
* [LPM] A terribly simple fix to a terribly complex bug: PR18773.Chandler Carruth2014-02-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The crux of the issue is that LCSSA doesn't preserve stateful alias analyses. Before r200067, LICM didn't cause LCSSA to run in the LTO pass manager, where LICM runs essentially without any of the other loop passes. As a consequence the globalmodref-aa pass run before that loop pass manager was able to survive the loop pass manager and be used by DSE to eliminate stores in the function called from the loop body in Adobe-C++/loop_unroll (and similar patterns in other benchmarks). When LICM was taught to preserve LCSSA it had to require it as well. This caused it to be run in the loop pass manager and because it did not preserve AA, the stateful AA was lost. Most of LLVM's AA isn't stateful and so this didn't manifest in most cases. Also, in most cases LCSSA was already running, and so there was no interesting change. The real kicker is that LCSSA by its definition (injecting PHI nodes only) trivially preserves AA! All we need to do is mark it, and then everything goes back to working as intended. It probably was blocking some other weird cases of stateful AA but the only one I have is a 1000-line IR test case from loop_unroll, so I don't really have a good test case here. Hopefully this fixes the regressions on performance that have been seen since that revision. llvm-svn: 201104
* Make succ_iterator a real random access iterator and clean up a couple of users.Benjamin Kramer2014-02-101-2/+1
| | | | llvm-svn: 201088
* [asan] support for FreeBSD, LLVM part. patch by Viktor KutuzovKostya Serebryany2014-02-101-2/+7
| | | | llvm-svn: 201067
* LoopVectorizer: Keep track of conditional store basic blocksArnold Schwaighofer2014-02-081-0/+4
| | | | | | | | | | | | | Before conditional store vectorization/unrolling we had only one vectorized/unrolled basic block. After adding support for conditional store vectorization this will not only be one block but multiple basic blocks. The last block would have the back-edge. I updated the code to use a vector of basic blocks instead of a single basic block and fixed the users to use the last entry in this vector. But, I forgot to add the basic blocks to this vector! Fixes PR18724. llvm-svn: 201028
* [Constant Hoisting] Fix insertion point for constant materialization.Juergen Ributzka2014-02-081-18/+21
| | | | | | | | | | The bitcast instruction during constant materialization was not placed correcly in the presence of phi nodes. This commit fixes the insertion point to be in the idom instead. This fixes PR18768 llvm-svn: 201009
* [Constant Hoisting] Don't update the use list while traversing it - DOH!Juergen Ributzka2014-02-081-5/+16
| | | | | | | | This fix first traverses the whole use list of the constant expression and keeps track of the instructions that need to be updated. Then perform the fixup afterwards. llvm-svn: 201008
* [CodeGenPrepare] Move away sign extensions that get in the way of addressingQuentin Colombet2014-02-061-14/+808
| | | | | | | | | | | | | | | | | | | | | | | | | | mode. Basically the idea is to transform code like this: %idx = add nsw i32 %a, 1 %sextidx = sext i32 %idx to i64 %gep = gep i8* %myArray, i64 %sextidx load i8* %gep Into: %sexta = sext i32 %a to i64 %idx = add nsw i64 %sexta, 1 %gep = gep i8* %myArray, i64 %idx load i8* %gep That way the computation can be folded into the addressing mode. This transformation is done as part of the addressing mode matcher. If the matching fails (not profitable, addressing mode not legal, etc.), the matcher will revert the related promotions. <rdar://problem/15519855> llvm-svn: 200947
* A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton!Nick Lewycky2014-02-061-1/+11
| | | | llvm-svn: 200907
* Set default of inlinecold-threshold to 225.Manman Ren2014-02-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | 225 is the default value of inline-threshold. This change will make sure we have the same inlining behavior as prior to r200886. As Chandler points out, even though we don't have code in our testing suite that uses cold attribute, there are larger applications that do use cold attribute. r200886 + this commit intend to keep the same behavior as prior to r200886. We can later on tune the inlinecold-threshold. The main purpose of r200886 is to help performance of instrumentation based PGO before we actually hook up inliner with analysis passes such as BPI and BFI. For instrumentation based PGO, we try to increase inlining of hot functions and reduce inlining of cold functions by setting inlinecold-threshold. Another option suggested by Chandler is to use a boolean flag that controls if we should use OptSizeThreshold for cold functions. The default value of the boolean flag should not change the current behavior. But it gives us less freedom in controlling inlining of cold functions. llvm-svn: 200898
* Disable most IR-level transform passes on functions marked 'optnone'.Paul Robinson2014-02-0629-0/+89
| | | | | | | | | Ideally only those transform passes that run at -O0 remain enabled, in reality we get as close as we reasonably can. Passes are responsible for disabling themselves, it's not the job of the pass manager to do it for them. llvm-svn: 200892
* Inliner uses a smaller inline threshold for callees with cold attribute.Manman Ren2014-02-051-0/+11
| | | | | | | | Added command line option inlinecold-threshold to set threshold for inlining functions with cold attribute. Listen to the cold attribute when it would decrease the inline threshold. llvm-svn: 200886
* SimplifyLibCalls: Push TLI through the exp2->ldexp transform.Benjamin Kramer2014-02-041-29/+29
| | | | | | For the odd case of platforms with exp2 available but not ldexp. llvm-svn: 200795
* cleanup: scc_iterator consumers should use isAtEndDuncan P. N. Exon Smith2014-02-042-5/+3
| | | | | | | | | | | | | | No functional change. Updated loops from: for (I = scc_begin(), E = scc_end(); I != E; ++I) to: for (I = scc_begin(); !I.isAtEnd(); ++I) for teh win. llvm-svn: 200789
* OS X: the correct function is __sincospif_stret, not __sincospi_stretfTim Northover2014-02-041-2/+2
| | | | | | rdar://problem/13729466 llvm-svn: 200771
* Add strchr(p, 0) -> p + strlen(p) to SimplifyLibCallsKai Nacke2014-02-041-3/+4
| | | | | | | | Add the missing transformation strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls and remove the ToDo comment. Reviewer: Duncan P.N. Exan Smith llvm-svn: 200736
* Self-memcpy-elision and memcpy of constant byte to memset transforms don't ↵Nick Lewycky2014-02-041-4/+7
| | | | | | care how many bytes you were trying to transfer. Sink that safety test after those transforms. Noticed by inspection. llvm-svn: 200726
* inalloca: Don't remove dead arguments in the presence of inalloca argsReid Kleckner2014-02-031-0/+7
| | | | | | | | | | | | It disturbs the layout of the parameters in memory and registers, leading to problems in the backend. The plan for optimizing internal inalloca functions going forward is to essentially SROA the argument memory and demote any captured arguments (things that aren't trivially written by a load or store) to an indirect pointer to a static alloca. llvm-svn: 200717
* Lower llvm.expect intrinsic correctly for i1Duncan P. N. Exon Smith2014-02-021-5/+18
| | | | | | | | | | LowerExpectIntrinsic previously only understood the idiom of an expect intrinsic followed by a comparison with zero. For llvm.expect.i1, the comparison would be stripped by the early-cse pass. Patch by Daniel Micay. llvm-svn: 200664
* LoopVectorizer: Enable unrolling of conditional stores and the load/storeArnold Schwaighofer2014-02-021-3/+3
| | | | | | | | | unrolling heuristic per default Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed up some benchmarks while not causing any interesting regressions. llvm-svn: 200621
* [LPM] Apply a really big hammer to fix PR18688 by recursively reformingChandler Carruth2014-02-011-5/+18
| | | | | | | | | | | | | | | | | | | | LCSSA when we promote to SSA registers inside of LICM. Currently, this is actually necessary. The promotion logic in LICM uses SSAUpdater which doesn't understand how to place LCSSA PHI nodes. Teaching it to do so would be a very significant undertaking. It may be worthwhile and I've left a FIXME about this in the code as well as starting a thread on llvmdev to try to figure out the right long-term solution. For now, the PR needs to be fixed. Short of using the promition SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes, I don't see a cleaner or cheaper way of achieving this. Fortunately, LCSSA is relatively lazy and sparse -- it should only update instructions which need it. We can also skip the recursive variant when we don't promote to SSA values. llvm-svn: 200612
* Remove some unused #includesEli Bendersky2014-02-011-2/+0
| | | | llvm-svn: 200611
* Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."Reid Kleckner2014-02-011-86/+3
| | | | | | | This reverts commit r200576. It broke 32-bit self-host builds by vectorizing two calls to @llvm.bswap.i64, which we then fail to expand. llvm-svn: 200602
* [SLPV] Recognize vectorizable intrinsics during SLP vectorization andChandler Carruth2014-01-311-3/+86
| | | | | | | | | | transform accordingly. Based on similar code from Loop vectorization. Subsequent commits will include vectorization of function calls to vector intrinsics and form function calls to vector library calls. Patch by Raul Silvera! (Much delayed due to my not running dcommit) llvm-svn: 200576
* [vectorizer] Tweak the way we do small loop runtime unrolling in theChandler Carruth2014-01-311-15/+22
| | | | | | | | | | | | | | loop vectorizer to not do so when runtime pointer checks are needed and share code with the new (not yet enabled) load/store saturation runtime unrolling. Also ensure that we only consider the runtime checks when the loop hasn't already been vectorized. If it has, the runtime check cost has already been paid. I've fleshed out a test case to cover the scalar unrolling as well as the vector unrolling and comment clearly why we are or aren't following the pattern. llvm-svn: 200530
* Fix a bug in gcov instrumentation introduced by r195513. <rdar://15930350>Bob Wilson2014-01-311-1/+8
| | | | | | | | | | The entry block of a function starts with all the static allocas. The change in r195513 splits the block before those allocas, which has the effect of turning them into dynamic allocas. That breaks all sorts of things. Change to split after the initial allocas, and also add a comment explaining why the block is split. llvm-svn: 200515
* [LPM] Fix PR18643, another scary place where loop transforms failed toChandler Carruth2014-01-292-49/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | preserve loop simplify of enclosing loops. The problem here starts with LoopRotation which ends up cloning code out of the latch into the new preheader it is buidling. This can create a new edge from the preheader into the exit block of the loop which breaks LoopSimplify form. The code tries to fix this by splitting the critical edge between the latch and the exit block to get a new exit block that only the latch dominates. This sadly isn't sufficient. The exit block may be an exit block for multiple nested loops. When we clone an edge from the latch of the inner loop to the new preheader being built in the outer loop, we create an exiting edge from the outer loop to this exit block. Despite breaking the LoopSimplify form for the inner loop, this is fine for the outer loop. However, when we split the edge from the inner loop to the exit block, we create a new block which is in neither the inner nor outer loop as the new exit block. This is a predecessor to the old exit block, and so the split itself takes the outer loop out of LoopSimplify form. We need to split every edge entering the exit block from inside a loop nested more deeply than the exit block in order to preserve all of the loop simplify constraints. Once we try to do that, a problem with splitting critical edges surfaces. Previously, we tried a very brute force to update LoopSimplify form by re-computing it for all exit blocks. We don't need to do this, and doing this much will sometimes but not always overlap with the LoopRotate bug fix. Instead, the code needs to specifically handle the cases which can start to violate LoopSimplify -- they aren't that common. We need to see if the destination of the split edge was a loop exit block in simplified form for the loop of the source of the edge. For this to be true, all the predecessors need to be in the exact same loop as the source of the edge being split. If the dest block was originally in this form, we have to split all of the deges back into this loop to recover it. The old mechanism of doing this was conservatively correct because at least *one* of the exiting blocks it rewrote was the DestBB and so the DestBB's predecessors were fixed. But this is a much more targeted way of doing it. Making it targeted is important, because ballooning the set of edges touched prevents LoopRotate from being able to split edges *it* needs to split to preserve loop simplify in a coherent way -- the critical edge splitting would sometimes find the other edges in need of splitting but not others. Many, *many* thanks for help from Nick reducing these test cases mightily. And helping lots with the analysis here as this one was quite tricky to track down. llvm-svn: 200393
* [LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered"Chandler Carruth2014-01-291-7/+23
| | | | | | | | | | | | | | | | | | | | | | | because of the inside-out run of LoopSimplify in the LoopPassManager and the fact that LoopSimplify couldn't be "preserved" across two independent LoopPassManagers. Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI node because it thought it was rewriting (via SCEV) the incoming value to a loop invariant value. While it may well be invariant for the current loop, it may be rewritten in terms of an enclosing loop's values. This in and of itself is fine, as the LCSSA PHI node in the enclosing loop for the inner loop value we're rewriting will have its own LCSSA PHI node if used outside of the enclosing loop. With me so far? Well, the current loop and the enclosing loop may share an exiting block and exit block, and when they do they also share LCSSA PHI nodes. In this case, its not valid to RAUW through the LCSSA PHI node. Expected crazy test included. llvm-svn: 200372
* LoopVectorizer: Don't count the induction variable multiple timesArnold Schwaighofer2014-01-291-0/+9
| | | | | | | | When estimating register pressure, don't count the induction variable mulitple times. It is unlikely to be unrolled. This is currently disabled and hidden behind a flag ("enable-ind-var-reg-heur"). llvm-svn: 200371
* Fix pr14893.Rafael Espindola2014-01-281-0/+8
| | | | | | | | | | | When simplifycfg moves an instruction, it must drop metadata it doesn't know is still valid with the preconditions changes. In particular, it must drop the range and tbaa metadata. The patch implements this with an utility function to drop all metadata not in a white list. llvm-svn: 200322
* [vectorizer] Completely disable the block frequency guidance of the loopChandler Carruth2014-01-281-3/+13
| | | | | | | | | | | | | | | vectorizer, placing it behind an off-by-default flag. It turns out that block frequency isn't what we want at all, here or elsewhere. This has been I think a nagging feeling for several of us working with it, but Arnold has given some really nice simple examples where the results are so comprehensively wrong that they aren't useful. I'm planning to email the dev list with a summary of why its not really useful and a couple of ideas about how to better structure these types of heuristics. llvm-svn: 200294
OpenPOWER on IntegriCloud