summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [InstCombine] Do an about-face on how LLVM canonicalizes (cast (loadChandler Carruth2014-10-181-72/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...)) and (load (cast ...)): canonicalize toward the former. Historically, we've tried to load using the type of the *pointer*, and tried to match that type as closely as possible removing as many pointer casts as we could and trading them for bitcasts of the loaded value. This is deeply and fundamentally wrong. Repeat after me: memory does not have a type! This was a hard lesson for me to learn working on SROA. There is only one thing that should actually drive the type used for a pointer, and that is the type which we need to use to load from that pointer. Matching up pointer types to the loaded value types is very useful because it minimizes the physical size of the IR required for no-op casts. Similarly, the only thing that should drive the type used for a loaded value is *how that value is used*! Again, this minimizes casts. And in fact, the *only* thing motivating types in any part of LLVM's IR are the types used by the operations in the IR. We should match them as closely as possible. I've ended up removing some tests here as they were testing bugs or behavior that is no longer present. Mostly though, this is just cleanup to let the tests continue to function as intended. The only fallout I've found so far from this change was SROA and I have fixed it to not be impeded by the different type of load. If you find more places where this change causes optimizations not to fire, those too are likely bugs where we are assuming that the type of pointers is "significant" for optimization purposes. llvm-svn: 220138
* [SROA] Change how SROA does vector-based promotion of allocas to handleChandler Carruth2014-10-181-44/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | cases where the alloca type, the load types, and the store types used all disagree. Previously, the only way that vector-based promotion occured was if the alloca type was a vector type. This was one of the *very* few remaining uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns out it was a bad idea. The alloca type can change very easily based on the mixture of types loaded and stored to that alloca. We shouldn't be relying on it as a signal for very much. Instead, the source of truth should be loads and stores. We should canonicalize the loads and stores as much as possible and then rely on them exclusively in SROA. When looking and loads and stores, we may find many different candidate vector types. This change will let SROA try all of them to find a vector type which is a viable way to promote the entire alloca to a vector register. With this change, it becomes possible to do better canonicalization and optimization of loads and stores without breaking SROA in random ways, and that should allow fixing a core source of performance loss in hot numerical loops such as those in Eigen. llvm-svn: 220116
* [msan] Fix handling of byval arguments with large alignment.Evgeniy Stepanov2014-10-171-1/+2
| | | | | | | MSan param-tls slots are 8-byte aligned. This change clips alignment of memcpy into param-tls to 8. llvm-svn: 220101
* Revert "TRE: make TRE a bit more aggressive"Rafael Espindola2014-10-171-2/+8
| | | | | | | | | This reverts commit r219899. This also updates byval-tail-call.ll to make it clear what was breaking. Adding r219899 again will cause the load/store to disappear. llvm-svn: 220093
* [DSE] Remove no-data-layout-only type-based overlap checkingHal Finkel2014-10-171-8/+1
| | | | | | | | | | | | | | | | | DSE's overlap checking contained special logic, used only when no DataLayout was available, which inferred a complete overwrite when the pointee types were equal. This logic seems fine for regular loads/stores, but does not work for memcpy and friends. Instead of fixing this, I'm just removing it. Philosophically, transformations should not contain enhanced behavior used only when data layout is lacking (data layout should be strictly additive), and maintaining these rarely-tested code paths seems not worthwhile at this stage. Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case (slightly reduced from that provided by Aliaksei) replaces the original contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few other tests have been updated to have a data layout. llvm-svn: 220035
* [SROA] Switch the common variable name for the 'AllocaSlices' class toChandler Carruth2014-10-161-40/+42
| | | | | | | | | | | 'AS'. Using 'S' as this was a terrible idea. Arguably, 'AS' is not much better, but it at least follows the idea of using initialisms and removes active confusion about the AllocaSlices variable and a Slice variable. llvm-svn: 219963
* [SROA] More range-based cleanups to SROA, these brought to you byChandler Carruth2014-10-161-25/+12
| | | | | | | | | clang-modernize. I did have to clean up the variable types and whitespace a bit because the use of auto made the code much less readable here. llvm-svn: 219962
* [SROA] Switch a couple of overly complex iterator accessors to just beChandler Carruth2014-10-161-26/+10
| | | | | | | | | ArrayRef accessors. I think this even came up in review that this was over-engineered, and indeed it was. Time to un-build it. llvm-svn: 219958
* [SROA] Start more deeply moving SROA to use ranges rather than justChandler Carruth2014-10-161-45/+42
| | | | | | | | | | | | iterators. There are a ton of places where it essentially wants ranges rather than just iterators. This is just the first step that adds the core slice range typedefs and uses them in a couple of places. I still have to explicitly construct them because they've not been punched throughout the entire set of code. More range-based cleanups incoming. llvm-svn: 219955
* Allow call-slop optzn for destinations with a suitable dereferenceable attributeBjorn Steinbrink2014-10-161-14/+16
| | | | | | | | | | | | | | | Summary: Currently, call slot optimization requires that if the destination is an argument, the argument has the sret attribute. This is to ensure that the memory access won't trap. In addition to sret, we can also allow the optimization to happen for arguments that have the new dereferenceable attribute, which gives the same guarantee. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5832 llvm-svn: 219950
* fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)Sanjay Patel2014-10-161-1/+87
| | | | | | | | | | | | | | | | | | | | | | | | | If a square root call has an FP multiplication argument that can be reassociated, then we can hoist a repeated factor out of the square root call and into a fabs(). In the simplest case, this: y = sqrt(x * x); becomes this: y = fabs(x); This patch relies on an earlier optimization in instcombine or reassociate to put the multiplication tree into a canonical form, so we don't have to search over every permutation of the multiplication tree. Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to use function-level attributes to do this optimization. This needs to be fixed for both the intrinsics and in the backend. Differential Revision: http://reviews.llvm.org/D5787 llvm-svn: 219944
* Reapply r219832 - InstCombine: Narrow switch instructions using known bits.Akira Hatanaka2014-10-161-0/+31
| | | | | | | The code committed in r219832 asserted when it attempted to shrink a switch statement whose type was larger than 64-bit. llvm-svn: 219902
* TRE: make TRE a bit more aggressiveSaleem Abdulrasool2014-10-161-8/+2
| | | | | | | | | Make tail recursion elimination a bit more aggressive. This allows us to get tail recursion on functions that are just branches to a different function. The fact that the function takes a byval argument does not restrict it from being optimised into just a tail call. llvm-svn: 219899
* Revert r219832.Akira Hatanaka2014-10-161-31/+0
| | | | llvm-svn: 219884
* Preserve non-byval pointer alignment attributes using @llvm.assume when inliningHal Finkel2014-10-151-0/+45
| | | | | | | | | For pointer-typed function arguments, enhanced alignment can be asserted using the 'align' attribute. When inlining, if this enhanced alignment information is not otherwise available, preserve it using @llvm.assume-based alignment assumptions. llvm-svn: 219876
* Fixing the build failure due to compiler warnings and unnecessary ↵Chris Bieneman2014-10-151-3/+2
| | | | | | disambiguation. llvm-svn: 219861
* Defining a new API for debug options that doesn't rely on static global ↵Chris Bieneman2014-10-151-9/+17
| | | | | | | | | | | | | | | | | | cl::opts. Summary: This is based on the discussions from the LLVMDev thread: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075886.html Reviewers: chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5389 llvm-svn: 219854
* InstCombine: Narrow switch instructions using known bits.Akira Hatanaka2014-10-151-0/+31
| | | | | | | | | Truncate the operands of a switch instruction to a narrower type if the upper bits are known to be all ones or zeros. rdar://problem/17720004 llvm-svn: 219832
* [SLPVectorize] Basic ephemeral-value awarenessHal Finkel2014-10-151-3/+30
| | | | | | | | | | | | | | The SLP vectorizer should not vectorize ephemeral values. These are used to express information to the optimizer, and vectorizing them does not lead to faster code (because the ephemeral values are dropped prior to code generation, vectorized or not), and obscures the information the instructions are attempting to communicate (the logic that interprets the arguments to @llvm.assume generically does not understand vectorized conditions). Also, uses by ephemeral values are free (because they, and the necessary extractelement instructions, will be dropped prior to code generation). llvm-svn: 219816
* No need to cache this unused variable.Eric Christopher2014-10-141-3/+1
| | | | | | Patch by Ehsan Akhgari. llvm-svn: 219749
* [LoopVectorize] Ignore @llvm.assume for cost estimates and legalityHal Finkel2014-10-141-3/+32
| | | | | | | | | | | | | | A few minor changes to prevent @llvm.assume from interfering with loop vectorization. First, treat @llvm.assume like the lifetime intrinsics, which are scalarized (but don't otherwise interfere with the legality checking). Second, ignore the cost of ephemeral instructions in the loop (these will go away anyway during CodeGen). Alignment assumptions and other uses of @llvm.assume can often end up inside of loops that should be vectorized (this is not uncommon for assumptions generated by __attribute__((align_value(n))), for example). llvm-svn: 219741
* Optimize away fabs() calls when input is squared (known positive).Sanjay Patel2014-10-141-1/+30
| | | | | | | | | | | | Eliminate library calls and intrinsic calls to fabs when the input is a squared value. Note that no unsafe-math / fast-math assumptions are needed for this optimization. Differential Revision: http://reviews.llvm.org/D5777 llvm-svn: 219717
* InstCombine: Don't miscompile X % ((Pow2 << A) >>u B)David Majnemer2014-10-141-7/+4
| | | | | | | | | | | | | | | | | | | We assumed that A must be greater than B because the right hand side of a remainder operator must be nonzero. However, it is possible for A to be less than B if Pow2 is a power of two greater than 1. Take for example: i32 %A = 0 i32 %B = 31 i32 Pow2 = 2147483648 ((Pow2 << 0) >>u 31) is non-zero but A is less than B. This fixes PR21274. llvm-svn: 219713
* Switch to select optimization for two-case switchesMarcello Maggioni2014-10-141-0/+170
| | | | | | | This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block and a test to check that this condition is handled. llvm-svn: 219656
* fix formatting; NFCSanjay Patel2014-10-141-33/+25
| | | | llvm-svn: 219645
* Add some optional passes around the vectorizer to both better prepareChandler Carruth2014-10-141-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the IR going into it and to clean up the IR produced by the vectorizers. Note that these are *off by default* right now while folks collect data on whether the performance tradeoff is reasonable. In a build of the 'opt' binary, I see about 2% compile time regression due to this change on average. This is in my mind essentially the worst expected case: very little of the opt binary is going to *benefit* from these extra passes. I've seen several benchmarks improve in performance my small amounts due to running these passes, and there are certain (rare) cases where these passes make a huge difference by either enabling the vectorizer at all or by hoisting runtime checks out of the outer loop. My primary motivation is to prevent people from seeing runtime check overhead in benchmarks where the existing passes and optimizers would be able to eliminate that. I've chosen the sequence of passes based on the kinds of things that seem likely to be relevant for the code at each stage: rotaing loops for the vectorizer, finding correlated values, loop invariants, and unswitching opportunities from any runtime checks, and cleaning up commonalities exposed by the SLP vectorizer. I'll be pinging existing threads where some of these issues have come up and will start new threads to get folks to benchmark and collect data on whether this is the right tradeoff or we should do something else. llvm-svn: 219644
* InstCombine: Fix miscompile in X % -Y -> X % Y transformDavid Majnemer2014-10-131-6/+6
| | | | | | | | | | | We assumed that negation operations of the form (0 - %Z) resulted in a negative number. This isn't true if %Z was originally negative. Substituting the negative number into the remainder operation may result in undefined behavior because the dividend might be INT_MIN. This fixes PR21256. llvm-svn: 219639
* InstCombine: Don't miscompile (x lshr C1) udiv C2David Majnemer2014-10-131-4/+10
| | | | | | | | | | | | | We have a transform that changes: (x lshr C1) udiv C2 into: x udiv (C2 << C1) However, it is unsafe to do so if C2 << C1 discards any of C2's bits. This fixes PR21255. llvm-svn: 219634
* Revert r219223, it creates invalid PHI nodes.Joerg Sonnenberger2014-10-121-170/+0
| | | | llvm-svn: 219587
* InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 ↵Benjamin Kramer2014-10-121-0/+2
| | | | | | <u C-1) llvm-svn: 219585
* InstCombine: Simplify commonIDivTransformsDavid Majnemer2014-10-121-86/+76
| | | | | | | | | | A helper routine, MultiplyOverflows, was a less efficient reimplementation of APInt's smul_ov and umul_ov. While we are here, clean up the code so it's more uniform. No functionality change intended. llvm-svn: 219583
* InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to XDavid Majnemer2014-10-111-1/+2
| | | | | | | | | | Consider the case where X is 2. (2 <<s 31)/s-2147483648 is zero but we would fold to X. Note that this is valid when we are in the unsigned domain because we require NUW: 2 <<u 31 results in poison. This fixes PR21245. llvm-svn: 219568
* InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflowDavid Majnemer2014-10-111-5/+4
| | | | | | | | | | | | | | | | | | consider: C1 = INT_MIN C2 = -1 C1 * C2 overflows without a doubt but consider the following: %x = i32 INT_MIN This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1. N. B. Move the unsigned version of this transform to InstSimplify, it doesn't create any new instructions. This fixes PR21243. llvm-svn: 219567
* InstCombine: mul to shl shouldn't preserve nswDavid Majnemer2014-10-111-2/+0
| | | | | | | | | | | | | | | | | consider: mul i32 nsw %x, -2147483648 this instruction will not result in poison if %x is 1 however, if we transform this into: shl i32 nsw %x, 31 then we will be generating poison because we just shifted into the sign bit. This fixes PR21242. llvm-svn: 219566
* [SCEV] Fix one more caller blindly passing the latch to SCEV'sChandler Carruth2014-10-111-2/+1
| | | | | | | | | | | getSmallConstantTripCount even when it isn't the exiting block. I missed this in my first audit, very sorry. This was found in LNT and elsewhere. I don't have a test case, but it was completely obvious from inspection that this was the problem. I'll see if I can reduce a test case, but I'm not really hopeful, and the value seems quite low. llvm-svn: 219562
* [SCEV] Add some asserts to the recently improved trip count computationChandler Carruth2014-10-112-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routines and fix all of the bugs they expose. I hit a test case that crashed even without these asserts due to passing a non-exiting latch to the ExitingBlock parameter of the trip count computation machinery. However, when I add the nice asserts, it turns out we have plenty of coverage of these bugs, they just didn't manifest in crashers. The core problem seems to stem from an assumption that the latch *is* the exiting block. While this is often true, and somewhat the "normal" way to think about loops, it isn't necessarily true. The correct way to call the trip count routines in a *generic* fashion (that is, without a particular exit in mind) is to just use the loop's single exiting block if it has one. The trip count can't be computed generically unless it does. This works great for the loop vectorizer. The loop unroller actually *wants* to select the latch when it has to chose between multiple exits because for unrolling it is the latch trips that matter. But if this is the desire, it needs to explicitly guard for non-exiting latches and check for the generic trip count in that case. I've added the asserts, and added convenience APIs for querying the trip count generically that check for a single exit block. I've kept the APIs consistent between computing trip count and trip multiples. Thansk to Mark for the help debugging and tracking down the *right* fix here! llvm-svn: 219550
* SimplifyCFG: Don't convert phis into selects if we could remove undef behaviorArnold Schwaighofer2014-10-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead We used to transform this: define void @test6(i1 %cond, i8* %ptr) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br label %bb2 bb2: %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ] store i8 2, i8* %ptr.2, align 8 ret void } into this: define void @test6(i1 %cond, i8* %ptr) { %ptr.2 = select i1 %cond, i8* null, i8* %ptr store i8 2, i8* %ptr.2, align 8 ret void } because the simplifycfg transformation into selects would happen to happen before the simplifycfg transformation that removes unreachable control flow (We have 'unreachable control flow' due to the store to null which is undefined behavior). The existing transformation that removes unreachable control flow in simplifycfg is: /// If BB has an incoming value that will always trigger undefined behavior /// (eg. null pointer dereference), remove the branch leading here. static bool removeUndefIntroducingPredecessor(BasicBlock *BB) Now we generate: define void @test6(i1 %cond, i8* %ptr) { store i8 2, i8* %ptr.2, align 8 ret void } I did not see any impact on the test-suite + externals. rdar://18596215 llvm-svn: 219462
* [Reassociate] Don't canonicalize X - undef to X + (-undef).Chad Rosier2014-10-091-0/+4
| | | | | | | Phabricator Revision: http://reviews.llvm.org/D5674 PR21205 llvm-svn: 219434
* [InstCombine] Fix wrong folding of constant comparisons involving ashr and ↵Andrea Di Biagio2014-10-091-1/+2
| | | | | | | | | | | | | | negative values. This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we wrongly computed the distance between the highest bits set of two negative values. This fixes PR21222. Differential Revision: http://reviews.llvm.org/D5700 llvm-svn: 219406
* Use triple's isiOS() and isOSDarwin() methods.Bob Wilson2014-10-091-1/+1
| | | | | | | These methods are already used in lots of places. This makes things more consistent. NFC. llvm-svn: 219386
* Inliner: Non-local functions in COMDATs shouldn't be droppedDavid Majnemer2014-10-081-0/+7
| | | | | | | | | | A function with discardable linkage cannot be discarded if its a member of a COMDAT group without considering all the other COMDAT members as well. This sort of thing is already handled by GlobalOpt/GlobalDCE. This fixes PR21206. llvm-svn: 219335
* Revert "[InstCombine] re-commit r218721 with fix for pr21199"Justin Bogner2014-10-083-147/+8
| | | | | | | | | | | | | This seems to cause a miscompile when building clang, which causes a bootstrapped clang to fail or crash in several of its tests. See: http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-RA/builds/1184 http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/7813 This reverts commit r219282. llvm-svn: 219317
* Format spacing and remove extra lines to comply with standards. NFC.Suyog Sarda2014-10-081-5/+6
| | | | | | | Differential Revision: http://reviews.llvm.org/D5649 llvm-svn: 219286
* GlobalOpt: Don't drop unused memberes of a ComdatDavid Majnemer2014-10-081-8/+20
| | | | | | | | | A linkonce_odr member of a COMDAT shouldn't be dropped if we need to keep the entire COMDAT group. This fixes PR21191. llvm-svn: 219283
* [InstCombine] re-commit r218721 with fix for pr21199Gerolf Hoflehner2014-10-083-8/+147
| | | | | | | | The icmp-select-icmp optimization targets select-icmp.eq only. This is now ensured by testing the branch predicate explictly. This commit also includes the test case for pr21199. llvm-svn: 219282
* Revert r219175 - [InstCombine] re-commit r218721 icmp-select-icmp optimizationHans Wennborg2014-10-083-151/+8
| | | | | | This seems to have caused PR21199. llvm-svn: 219264
* DebugInfo+DFSan: Ensure that debug info references to llvm::Functions remain ↵David Blaikie2014-10-071-0/+10
| | | | | | | | | | | | | | | | | | | | | | pointing to the underlying function when wrappers are created This is somewhat the inverse of how similar bugs in DAE and ArgPromo manifested and were addressed. In those passes, individual call sites were visited explicitly, and then the old function was deleted. This left the debug info with a null llvm::Function* that needed to be updated to point to the new function. In the case of DFSan, it RAUWs the old function with the wrapper, which includes debug info. So now the debug info refers to the wrapper, which doesn't actually have any instructions with debug info in it, so it is ignored entirely - resulting in a DW_TAG_subprogram with no high/low pc, etc. Instead, fix up the debug info to refer to the original function after the RAUW messed it up. Reviewed/discussed with Peter Collingbourne on the llvm-dev mailing list. llvm-svn: 219249
* LoopUnroll: Create sub-loops in LoopInfoDuncan P. N. Exon Smith2014-10-071-1/+29
| | | | | | | | | | | | | `LoopUnrollPass` says that it preserves `LoopInfo` -- make it so. In particular, tell `LoopInfo` about copies of inner loops when unrolling the outer loop. Conservatively, also tell `ScalarEvolution` to forget about the original versions of these loops, since their inputs may have changed. Fixes PR20987. llvm-svn: 219241
* LoopUnroll: Only check for ScalarEvolution analysis once, NFCDuncan P. N. Exon Smith2014-10-071-7/+4
| | | | | | | A follow-up commit will add use to a tight loop. We might as well just find it once anyway. llvm-svn: 219239
* Two case switch to select optimizationMarcello Maggioni2014-10-071-0/+170
| | | | | | | | | | | | | | | | | | | | | This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block to a select or a couple of selects (depending if the default block is reachable or not). The typical case this optimization wants to be able to optimize is this one: Example: switch (a) { case 10: %0 = icmp eq i32 %a, 10 return 10; %1 = select i1 %0, i32 10, i32 4 case 20: ----> %2 = icmp eq i32 %a, 20 return 2; %3 = select i1 %2, i32 2, i32 %1 default: return 4; } It also sets the base for further optimizations that are planned and being reviewed. llvm-svn: 219223
OpenPOWER on IntegriCloud