summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Teach reassociate to commute FMul's and FAdd's in order to canonicalize the ↵Owen Anderson2012-05-071-0/+16
| | | | | | order of their operands across instructions. This allows for greater CSE opportunities. llvm-svn: 156323
* Small fix in InstCombineCasts.cpp. Restored "alloca + bitcast" reducing for ↵Stepan Dyatkovskiy2012-05-051-2/+5
| | | | | | | | case when alloca's size is calculated within the "add/sub/... nsw". Also added fix to 2011-06-13-nsw-alloca.ll test. llvm-svn: 156231
* remove calls to calloc if the allocated memory is not used (it was already ↵Nuno Lopes2012-05-031-2/+2
| | | | | | | | being done for malloc) fix a few typos found by Chad in my previous commit llvm-svn: 156110
* add support for calloc to objectsize loweringNuno Lopes2012-05-031-0/+20
| | | | llvm-svn: 156102
* The value held in the vector may be RAUW'ed by some of the canonicalizationBill Wendling2012-05-021-0/+50
| | | | | | | methods. Use a weak value handle to keep up with this. PR12245 llvm-svn: 155984
* An instruction in a loop is not guaranteed to be executed just because the loopNick Lewycky2012-05-011-0/+22
| | | | | | has no exit blocks. Fixes PR12706! llvm-svn: 155884
* Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. FixesLang Hames2012-05-011-0/+68
| | | | | | | | | <rdar://problem/11291436>. This is a second attempt at a fix for this, the first was r155468. Thanks to Chandler, Bob and others for the feedback that helped me improve this. llvm-svn: 155866
* Just mark the sign bit as known zero, rather than any other irrelevant bitsDuncan Sands2012-04-301-0/+12
| | | | | | known zero in the LHS. Fixes PR12541. llvm-svn: 155818
* Second attempt at PR12573:Bill Wendling2012-04-301-0/+101
| | | | | | | | | | | Allow the "SplitCriticalEdge" function to split the edge to a landing pad. If the pass is *sure* that it thinks it knows what it's doing, then it may go ahead and specify that the landing pad can have its critical edge split. The loop unswitch pass is one of these passes. It will split the critical edges of all edges coming from a loop to a landing pad not within the loop. Doing so will retain important loop analysis information, such as loop simplify. llvm-svn: 155817
* Make sure HoistInsertPosition finds a position that is dominated by allRafael Espindola2012-04-301-0/+34
| | | | | | inputs. llvm-svn: 155809
* Don't vectorize target-specific types (ppc_fp128, x86_fp80, etc.).Hal Finkel2012-04-271-0/+18
| | | | | | | | | | Target specific types should not be vectorized. As a practical matter, these types are already register matched (at least in the x86 case), and codegen does not always work correctly (at least in the ppc case, and this is not worth fixing because ppc_fp128 is currently broken and will probably go away soon). llvm-svn: 155729
* Reapply r155682, making constant folding more consistent, with a fix to workDan Gohman2012-04-272-0/+30
| | | | | | properly with how the code handles all-undef PHI nodes. llvm-svn: 155721
* Revert r155682, "Use ConstantExpr::getExtractElement when constant-folding ↵NAKAMURA Takumi2012-04-271-20/+0
| | | | | | | | vectors" It broke stage2 build. stage1/clang sometimes crashed. llvm-svn: 155699
* Use ConstantExpr::getExtractElement when constant-folding vectorsDan Gohman2012-04-271-0/+20
| | | | | | | | | | | | | | | | | | | | | instead of getAggregateElement. This has the advantage of being more consistent and allowing higher-level constant folding to procede even if an inner extract element cannot be folded. Make ConstantFoldInstruction call ConstantFoldConstantExpression on the instruction's operands, making it more consistent with ConstantFoldConstantExpression itself. This makes sure that ConstantExprs get TargetData-aware folding before being handed off as operands for further folding. This causes more expressions to be folded, but due to a known shortcoming in constant folding, this currently has the side effect of stripping a few more nuw and inbounds flags in the non-targetdata side of constant-fold-gep.ll. This is mostly harmless. This fixes rdar://11324230. llvm-svn: 155682
* Add instcombine patterns for the following transformations:Chad Rosier2012-04-261-0/+24
| | | | | | | | | | (x & y) | (x ^ y) -> x | y (x & y) + (x ^ y) -> x | y Patch by Manman Ren. rdar://10770603 llvm-svn: 155674
* Teach the reassociate pass to fold chains of multiplies with repeatedChandler Carruth2012-04-261-0/+99
| | | | | | | | | | | | | | | | | elements to minimize the number of multiplies required to compute the final result. This uses a heuristic to attempt to form near-optimal binary exponentiation-style multiply chains. While there are some cases it misses, it seems to at least a decent job on a very diverse range of inputs. Initial benchmarks show no interesting regressions, and an 8% improvement on SPASS. Let me know if any other interesting results (in either direction) crop up! Credit to Richard Smith for the core algorithm, and helping code the patch itself. llvm-svn: 155616
* Actually delete now-empty file.Chandler Carruth2012-04-251-0/+0
| | | | llvm-svn: 155532
* Reverting r155468. Chris and Chandler have convinced me that it's dangerous andLang Hames2012-04-251-68/+0
| | | | | | | | in poor taste. Talking through some alternate solutions with Chandler. llvm-svn: 155530
* ConstantFoldSelectInstruction swapped the operands of the select.Nadav Rotem2012-04-241-0/+13
| | | | | | Fix 12592. Patch by Matt Pharr. llvm-svn: 155480
* Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. This fixesLang Hames2012-04-241-0/+68
| | | | | | <rdar://problem/11291436>. llvm-svn: 155468
* Fix a crash on valid (if UB) bitcode that is produced for some globalChandler Carruth2012-04-241-0/+5
| | | | | | | | | | | | | | | | | | | | | constants in C++11 mode. I have no idea why it required such particular circumstances to get here, the code seems clearly to rely upon unchecked assumptions. Specifically, when we decide to form an index into a struct type, we may have gone through (at least one) zero-length array indexing round, which would have left the offset un-adjusted, and thus not necessarily valid for use when indexing the struct type. This is just an canonicalization step, so the correct thing is to refuse to canonicalize nonsensical GEPs of this form. Implemented, and test case added. Fixes PR12642. Pair debugged and coded with Richard Smith. =] I credit him with most of the debugging, and preventing me from writing the wrong code. llvm-svn: 155466
* Reapply r155136 after fixing PR12599.Jakob Stoklund Olesen2012-04-236-17/+165
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original commit message: Defer some shl transforms to DAGCombine. The shl instruction is used to represent multiplication by a constant power of two as well as bitwise left shifts. Some InstCombine transformations would turn an shl instruction into a bit mask operation, making it difficult for later analysis passes to recognize the constsnt multiplication. Disable those shl transformations, deferring them to DAGCombine time. An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'. These transformations are deferred: (X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses) (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1) (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2) The corresponding exact transformations are preserved, just like div-exact + mul: (X >>?,exact C) << C --> X (X >>?,exact C1) << C2 --> X << (C2-C1) (X >>?,exact C1) << C2 --> X >>?,exact (C1-C2) The disabled transformations could also prevent the instruction selector from recognizing rotate patterns in hash functions and cryptographic primitives. I have a test case for that, but it is too fragile. llvm-svn: 155362
* Tidy up this test more:Chandler Carruth2012-04-222-22/+23
| | | | | | | | | | | | | 1) Make the checked assertions a bit more precise. We really want the canonical forms coming out of reassociate to be exactly what is expected. 2) Remove other passes, and switch the test to actually directly check that reassociate makes the important transforms and canonicalizations. 3) Fold in a related test case now that we're using FileCheck. Make the same tidying changes to it. llvm-svn: 155311
* FileCheck-ize a test, and tidy it up a touch.Chandler Carruth2012-04-221-9/+14
| | | | llvm-svn: 155310
* Revert r155136 "Defer some shl transforms to DAGCombine."Jakob Stoklund Olesen2012-04-206-165/+17
| | | | | | | | | While the patch was perfect and defect free, it exposed a really nasty bug in X86 SelectionDAG that caused an llc crash when compiling lencod. I'll put the patch back in after fixing the SelectionDAG problem. llvm-svn: 155181
* Avoid a bug in the path count computation, preventing an infiniteDan Gohman2012-04-191-0/+48
| | | | | | loop repeatedlt making the same change. This is for rdar://11256239. llvm-svn: 155160
* Defer some shl transforms to DAGCombine.Jakob Stoklund Olesen2012-04-196-17/+165
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The shl instruction is used to represent multiplication by a constant power of two as well as bitwise left shifts. Some InstCombine transformations would turn an shl instruction into a bit mask operation, making it difficult for later analysis passes to recognize the constsnt multiplication. Disable those shl transformations, deferring them to DAGCombine time. An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'. These transformations are deferred: (X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses) (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1) (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2) The corresponding exact transformations are preserved, just like div-exact + mul: (X >>?,exact C) << C --> X (X >>?,exact C1) << C2 --> X << (C2-C1) (X >>?,exact C1) << C2 --> X >>?,exact (C1-C2) The disabled transformations could also prevent the instruction selector from recognizing rotate patterns in hash functions and cryptographic primitives. I have a test case for that, but it is too fragile. llvm-svn: 155136
* Extract the broken part of XFAILed test into its own file.Jakob Stoklund Olesen2012-04-192-94/+93
| | | | llvm-svn: 155081
* FileCheckizeJakob Stoklund Olesen2012-04-181-2/+59
| | | | llvm-svn: 155010
* Nobody likes shifty instructions, but that was a bit strong.Jakob Stoklund Olesen2012-04-181-1/+1
| | | | llvm-svn: 155009
* FileCheckify, un-XFAIL SimplifyLibCalls/floor testJoe Groff2012-04-181-10/+31
| | | | | | Fixes build on MSVC llvm-svn: 154970
* Move win32 SimplifyLibcall test under TransformsJoe Groff2012-04-181-0/+275
| | | | llvm-svn: 154967
* Flip the new block-placement pass to be on by default.Chandler Carruth2012-04-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly to test the waters. I'd like to get results from FNT build bots and other bots running on non-x86 platforms. This feature has been pretty heavily tested over the last few months by me, and it fixes several of the execution time regressions caused by the inlining work by preventing inlining decisions from radically impacting block layout. I've seen very large improvements in yacr2 and ackermann benchmarks, along with the expected noise across all of the benchmark suite whenever code layout changes. I've analyzed all of the regressions and fixed them, or found them to be impossible to fix. See my email to llvmdev for more details. I'd like for this to be in 3.1 as it complements the inliner changes, but if any failures are showing up or anyone has concerns, it is just a flag flip and so can be easily turned off. I'm switching it on tonight to try and get at least one run through various folks' performance suites in case SPEC or something else has serious issues with it. I'll watch bots and revert if anything shows up. llvm-svn: 154816
* Fix an error in BBVectorize important for vectorizing pointer types.Hal Finkel2012-04-141-0/+23
| | | | | | | | | | When vectorizing pointer types it is important to realize that potential pairs cannot be connected via the address pointer argument of a load or store. This is because even after vectorization, the address is still a scalar because the address of the higher half of the pair is implicit from the address of the lower half (it need not be, and should not be, explicitly computed). llvm-svn: 154735
* Enhance BBVectorize to more-properly handle pointer values and vectorize GEPs.Hal Finkel2012-04-141-0/+81
| | | | llvm-svn: 154734
* Add support to BBVectorize for vectorizing selects.Hal Finkel2012-04-131-0/+30
| | | | llvm-svn: 154700
* Consider ObjC runtime calls objc_storeWeak and others which make a copy ofDan Gohman2012-04-131-0/+131
| | | | | | | their argument as "escape" points for objc_retainBlock optimization. This fixes rdar://11229925. llvm-svn: 154682
* Use the new Use-aware dominates method to apply the objc runtimeDan Gohman2012-04-131-0/+18
| | | | | | | library return value optimization for phi uses. Even when the phi itself is not dominated, the specific use may be dominated. llvm-svn: 154647
* Don't move objc_autorelease calls past autorelease pool boundaries whenDan Gohman2012-04-131-4/+78
| | | | | | | optimizing autorelease calls on phi nodes with null operands. This fixes rdar://11207070. llvm-svn: 154642
* Fix 12513: Loop unrolling breaks with indirect branches.Andrew Trick2012-04-102-8/+43
| | | | | | | | Take this opportunity to generalize the indirectbr bailout logic for loop transformations. CFG transformations will never get indirectbr right, and there's no point trying. llvm-svn: 154386
* Teach InstCombine to nuke a common alloca pattern -- an alloca which hasChandler Carruth2012-04-081-0/+44
| | | | | | | | | | | | GEPs, bit casts, and stores reaching it but no other instructions. These often show up during the iterative processing of the inliner, SROA, and DCE. Once we hit this point, we can completely remove the alloca. These were actually showing up in the final, fully optimized code in a bunch of inliner tests I've been working on, and notably they show up after LLVM finishes optimizing away all function calls involved in hash_combine(a, b). llvm-svn: 154285
* Fix ValueTracking to conclude that debug intrinsics are safe toChandler Carruth2012-04-071-4/+44
| | | | | | | | | | | | | | | | | | speculate. Without this, loop rotate (among many other places) would suddenly stop working in the presence of debug info. I found this looking at loop rotate, and have augmented its tests with a reduction out of a very hot loop in yacr2 where failing to do this rotation costs sometimes more than 10% in runtime performance, perturbing numerous downstream optimizations. This should have no impact on performance without debug info, but the change in performance when debug info is enabled can be extreme. As a consequence (and this how I got to this yak) any profiling of performance problems should be treated with deep suspicion -- they may have been wildly innacurate of debug info was enabled for profiling. =/ Just a heads up. llvm-svn: 154263
* Sink the collection of return instructions until after *all*Chandler Carruth2012-04-061-0/+37
| | | | | | | | | | | simplification has been performed. This is a bit less efficient (requires another ilist walk of the basic blocks) but shouldn't matter in practice. More importantly, it's just too much work to keep track of all the various ways the return instructions can be mutated while simplifying them. This fixes yet another crasher, reported by Daniel Dunbar. llvm-svn: 154179
* Tweak this test to ensure the inliner did indeed fire. Thanks to RichardChandler Carruth2012-04-061-0/+1
| | | | | | Smith for pointing this out in review. llvm-svn: 154178
* Actually finish this sentence in the comment the way I intended. ThanksChandler Carruth2012-04-061-1/+1
| | | | | | Matt for pointing this out. llvm-svn: 154158
* Sink the return instruction collection until after we're done deletingChandler Carruth2012-04-061-0/+37
| | | | | | | | | | | | | | dead code, including dead return instructions in some cases. Otherwise, we end up having a bogus poniter to a return instruction that blows up much further down the road. It turns out that this pattern is both simpler to code, easier to update in the face of enhancements to the inliner cleanup, and likely cheaper given that it won't add dead instructions to the list. Thanks to John Regehr's numerous test cases for teasing this out. llvm-svn: 154157
* Fix accidentally inverted logic from r152803, and make theDan Gohman2012-04-051-0/+6
| | | | | | testcase slightly less trivial. This fixes rdar://11171718. llvm-svn: 154118
* Add testcase for r154007, when a function has the optsize attribute,Hongbin Zheng2012-04-041-0/+35
| | | | | | the loop should be unrolled according the value of OptSizeUnrollThreshold. llvm-svn: 154014
* Always compute all the bits in ComputeMaskedBits.Rafael Espindola2012-04-041-0/+15
| | | | | | | | This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011
* Fast fix for PR12343:Stepan Dyatkovskiy2012-04-021-0/+46
| | | | | | | | | | http://llvm.org/bugs/show_bug.cgi?id=12343 We have not trivial way for splitting edges that are goes from indirect branch. We can do it with some tricks, but it should be additionally discussed. And it is still dangerous due to difficulty of indirect branches controlling. Fix forbids this case for unswitching. llvm-svn: 153879
OpenPOWER on IntegriCloud