summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
* Reorders two transforms that collide with each otherDavid Majnemer2013-04-141-8/+8
| | | | | | | | | | | | | | | | | | | | | | One performs: (X == 13 | X == 14) -> X-13 <u 2 The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1 The problem is that there are certain values of C1 and C2 that trigger both transforms but the first one blocks out the second, this generates suboptimal code. Reordering the transforms should be better in every case and allows us to do interesting stuff like turn: %shr = lshr i32 %X, 4 %and = and i32 %shr, 15 %add = add i32 %and, -14 %tobool = icmp ne i32 %add, 0 into: %and = and i32 %X, 240 %tobool = icmp ne i32 %and, 224 llvm-svn: 179493
* InstCombine: Check the operand types before merging fcmp ord & fcmp ord.Benjamin Kramer2013-04-121-0/+3
| | | | | | Fixes PR15737. llvm-svn: 179417
* Simplify (A & ~B) in icmp if A is a power of 2David Majnemer2013-04-121-0/+9
| | | | | | | | The transform will execute like so: (A & ~B) == 0 --> (A & B) != 0 (A & ~B) != 0 --> (A & B) == 0 llvm-svn: 179386
* Optimize icmp involving addition betterDavid Majnemer2013-04-111-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | Allows LLVM to optimize sequences like the following: %add = add nsw i32 %x, 1 %cmp = icmp sgt i32 %add, %y into: %cmp = icmp sge i32 %x, %y as well as: %add1 = add nsw i32 %x, 20 %add2 = add nsw i32 %y, 57 %cmp = icmp sge i32 %add1, %add2 into: %add = add nsw i32 %y, 37 %cmp = icmp sle i32 %cmp, %x llvm-svn: 179316
* Fix for wrong instcombine on vector insert/extractBenjamin Kramer2013-04-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When trying to collapse sequences of insertelement/extractelement instructions into single shuffle instructions, there is one specific case where the Instruction Combiner wrongly updates the resulting Mask of shuffle indexes. The problem is in function CollectShuffleElments. If we have a sequence of insert/extract element instructions like the one below: %tmp1 = extractelement <4 x float> %LHS, i32 0 %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1 %tmp3 = extractelement <4 x float> %RHS, i32 2 %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3 Where: . %RHS will have a mask of [4,5,6,7] . %LHS will have a mask of [0,1,2,3] The Mask of shuffle indexes is wrongly computed to [4,1,6,7] instead of [4,0,6,7]. When analyzing %tmp2 in order to compute the Mask for the resulting shuffle instruction, the algorithm forgets to update the mask index at position 1 with the index associated to the element extracted from %LHS by instruction %tmp1. Patch by Andrea DiBiagio! llvm-svn: 179291
* Tidy up a bit. No functional change.Jim Grosbach2013-04-059-259/+261
| | | | llvm-svn: 178915
* Check if Type is a vector before calling function Type::getVectorNumElements.Akira Hatanaka2013-03-281-3/+4
| | | | llvm-svn: 178208
* Make InstCombineCasts.cpp:OptimizeIntToFloatBitCast endian safe.Ulrich Weigand2013-03-261-1/+9
| | | | | | | | | | | | | The OptimizeIntToFloatBitCast converts shift-truncate sequences into extractelement operations. The computation of the element index to be used in the resulting operation is currently only correct for little-endian targets. This commit fixes the element index computation to be correct for big-endian targets as well. If the target byte order is unknown, the optimization cannot be performed at all. llvm-svn: 178031
* Fix a bug in fast-math fadd/fsub simplification. Shuxin Yang2013-03-251-10/+43
| | | | | | | | | | | The problem is that the code mistakenly took for granted that following constructor is able to create an APFloat from a *SIGNED* integer: APFloat::APFloat(const fltSemantics &ourSemantics, integerPart value) rdar://13486998 llvm-svn: 177906
* Address issues found by Duncan during post-commit review of r177856.Arnaud A. de Grandmaison2013-03-251-32/+19
| | | | llvm-svn: 177863
* InstCombine: simplify comparisons to zero of (shl %x, Cst) or (mul %x, Cst)Arnaud A. de Grandmaison2013-03-251-0/+83
| | | | | | | | This simplification happens at 2 places : - using the nsw attribute when the shl / mul is used by a sign test - when the shl / mul is compared for (in)equality to zero llvm-svn: 177856
* InstCombine: Improve the result bitvect type when folding (cmp pred (load ↵Arnaud A. de Grandmaison2013-03-221-11/+20
| | | | | | | | | | | | | | | (gep GV, i)) C) to a bit test. The original code used i32, and i64 if legal. This introduced unneeded casts when they aren't legal, or when the index variable i has another type. In order of preference: try to use i's type; use the smallest fitting legal type (using an added DataLayout method); default to i32. A testcase checks that this works when the index gep operand is i16. Patch by : Ahmed Bougacha <ahmed.bougacha@gmail.com> Reviewed by : Duncan llvm-svn: 177712
* Perform factorization as a last resort of unsafe fadd/fsub simplification.Shuxin Yang2013-03-141-5/+91
| | | | | | | | | | | | | | | Rules include: 1)1 x*y +/- x*z => x*(y +/- z) (the order of operands dosen't matter) 2) y/x +/- z/x => (y +/- z)/x The transformation is disabled if the new add/sub expr "y +/- z" is a denormal/naz/inifinity. rdar://12911472 llvm-svn: 177088
* Fix a performance regression when combining to smaller types in icmp (shl ↵Arnaud A. de Grandmaison2013-03-131-3/+4
| | | | | | | | %v, C1), C2 : Only combine when the shl is only used by the icmp llvm-svn: 176950
* Simplify code. No functionality change.Jakub Staszak2013-03-091-2/+2
| | | | llvm-svn: 176765
* InstCombine: Don't shrink allocas when combining with a bitcast.Jim Grosbach2013-03-061-0/+6
| | | | | | | | | | When considering folding a bitcast of an alloca into the alloca itself, make sure we don't shrink the amount of memory being allocated, or things rapidly go sideways. rdar://13324424 llvm-svn: 176547
* Fix a bug in instcombine for fmul in fast math mode.Quentin Colombet2013-02-281-3/+3
| | | | | | | | | | | | | | | The instcombine recognized pattern looks like: a = b * c d = a +/- Cst or a = b * c d = Cst +/- a When creating the new operands for fadd or fsub instruction following the related fmul, the first operand was created with the second original operand (M0 was created with C1) and the second with the first (M1 with Opnd0). The fix consists in creating the new operands with the appropriate original operand, i.e., M0 with Opnd0 and M1 with C1. llvm-svn: 176300
* The transform is:Bill Wendling2013-02-161-0/+14
| | | | | | | | | | | | | | | (or (bool?A:B),(bool?C:D)) --> (bool?(or A,C):(or B,D)) By the time the OR is visited, both the SELECTs have been visited and not optimized and the OR itself hasn't been transformed so we do this transform in the hopes that the new ORs will be optimized. The transform is explicitly disabled for vector-selects until "codegen matures to handle them better". Patch by Muhammad Tauqir! llvm-svn: 175380
* Fix refactoring mistake in "Teach InstCombine to work with smaller legal ↵Arnaud A. de Grandmaison2013-02-151-1/+1
| | | | | | types..." llvm-svn: 175273
* Teach InstCombine to work with smaller legal types in icmp (shl %v, C1), C2Arnaud A. de Grandmaison2013-02-151-0/+19
| | | | | | | | | It enables to work with a smaller constant, which is target friendly for those which can compare to immediates. It also avoids inserting a shift in favor of a trunc, which can be free on some targets. This used to work until LLVM-3.1, but regressed with the 3.2 release. llvm-svn: 175270
* Fix commentArnaud A. de Grandmaison2013-02-131-2/+2
| | | | | | visitSExt is an adapted copy of the related visitZExt method, so adapt the comment accordingly. llvm-svn: 175019
* Optimization: bitcast (<1 x ...> insertelement ..., X, ...) to ... ==> ↵Michael Ilseman2013-02-111-5/+16
| | | | | | bitcast X to ... llvm-svn: 174905
* Revert "Have InstCombine call SipmlifyCall when handling calls. Test case ↵Andrew Trick2013-02-081-6/+0
| | | | | | | | | | included." This reverts commit 3854a5d90fee52af1065edbed34521fff6cdc18d. This causes a clang unit test to hang: vtable-available-externally.cpp. llvm-svn: 174692
* Have InstCombine call SipmlifyCall when handling calls. Test case included.Michael Ilseman2013-02-071-0/+6
| | | | llvm-svn: 174675
* Preserve fast-math flags after reassociation and commutation. Update test casesMichael Ilseman2013-02-071-5/+20
| | | | llvm-svn: 174571
* InstCombine: Fix and simplify the inttoptr side too.Benjamin Kramer2013-02-051-13/+8
| | | | llvm-svn: 174438
* InstCombine: Harden code to work with vectors of pointers and simplify it a bit.Benjamin Kramer2013-02-051-11/+7
| | | | | | Found by running instcombine on a fabricated test case for the constant folder. llvm-svn: 174430
* Revert r174152. The shift amount may overflow and in that case this ↵Nadav Rotem2013-02-011-6/+0
| | | | | | transformation is illegal. llvm-svn: 174156
* Optimize shift lefts of a constant by a value plus constant into a single shift.Nadav Rotem2013-02-011-0/+6
| | | | llvm-svn: 174152
* Convert typeIncompatible to return an AttributeSet.Bill Wendling2013-01-301-3/+10
| | | | | | | There are still places which treat the Attribute object as a collection of attributes. I'm systematically removing them. llvm-svn: 173990
* InstCombine: canonicalize sext-and --> selectNadav Rotem2013-01-301-0/+28
| | | | | | | | sext-not-and --> select. Patch by Muhammad Tauqir Ahmad. llvm-svn: 173901
* Use the AttributeSet instead of AttributeWithIndex.Bill Wendling2013-01-271-25/+18
| | | | | | | In the future, AttributeWithIndex won't be used anymore. Besides, it exposes the internals of the AttributeSet to outside users, which isn't goodness. llvm-svn: 173602
* Remove some introspection functions.Bill Wendling2013-01-251-6/+8
| | | | | | | | The 'getSlot' function and its ilk allow introspection into the AttributeSet class. However, that class should be opaque. Allow access through accessor methods instead. llvm-svn: 173522
* Use the new 'getSlotIndex' method to retrieve the attribute's slot index.Bill Wendling2013-01-251-1/+1
| | | | llvm-svn: 173499
* Remove trailing whitespace.Craig Topper2013-01-241-134/+134
| | | | llvm-svn: 173322
* Revert "InstCombine: Clean up weird code that talks about a modulus that's ↵Benjamin Kramer2013-01-231-1/+6
| | | | | | | | | long gone." This causes crashes during the build of compiler-rt during selfhost. Add a testcase for coverage. llvm-svn: 173279
* InstCombine: Clean up weird code that talks about a modulus that's long gone.Benjamin Kramer2013-01-231-6/+1
| | | | | | | This does the right thing unless the multiplication overflows, but the old code didn't handle that case either. llvm-svn: 173276
* Remove the last of uses that use the Attribute object as a collection of ↵Bill Wendling2013-01-231-13/+21
| | | | | | | | | attributes. Collections of attributes are handled via the AttributeSet class now. This finally frees us up to make significant changes to how attributes are structured. llvm-svn: 173228
* Have AttributeSet::getRetAttributes() return an AttributeSet instead of ↵Bill Wendling2013-01-211-4/+4
| | | | | | | | | Attribute. This further restricts the use of the Attribute class to the Attribute family of classes. llvm-svn: 173098
* Make AttributeSet::getFnAttributes() return an AttributeSet instead of an ↵Bill Wendling2013-01-211-6/+7
| | | | | | | | | Attribute. This is more code to isolate the use of the Attribute class to that of just holding one attribute instead of a collection of attributes. llvm-svn: 173094
* Transform (sub 0, (zext bool to A)) to (sext bool to A) andPaul Redmond2013-01-211-0/+10
| | | | | | | | | (sub 0, (sext bool to A)) to (zext bool to A). Patch by Muhammad Ahmad Reviewed by Duncan Sands llvm-svn: 173093
* Use AttributeSet accessor methods instead of Attribute accessor methods.Bill Wendling2013-01-181-3/+3
| | | | | | | Further encapsulation of the Attribute object. Don't allow direct access to the Attribute object as an aggregate. llvm-svn: 172853
* Push some more methods down to hide the use of the Attribute class.Bill Wendling2013-01-181-2/+2
| | | | | | | | Because the Attribute class is going to stop representing a collection of attributes, limit the use of it as an aggregate in favor of using AttributeSet. This replaces some of the uses for querying the function attributes. llvm-svn: 172844
* Check for less than 0 in shuffle mask instead of -1. It's more consistent ↵Craig Topper2013-01-181-1/+1
| | | | | | with other code related to shuffles and easier to implement in compiled code. llvm-svn: 172788
* Remove trailing whitespace. Remove new lines between closing brace and 'else'Craig Topper2013-01-181-7/+5
| | | | llvm-svn: 172784
* Teach InstCombine to optimize extract of a value from a vector add operation ↵Nadav Rotem2013-01-151-0/+9
| | | | | | with a constant zero. llvm-svn: 172576
* 1. Hoist minus sign as high as possible in an attempt to revealShuxin Yang2013-01-151-31/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | some optimization opportunities (in the enclosing supper-expressions). rule 1. (-0.0 - X ) * Y => -0.0 - (X * Y) if expression "-0.0 - X" has only one reference. rule 2. (0.0 - X ) * Y => -0.0 - (X * Y) if expression "0.0 - X" has only one reference, and the instruction is marked "noSignedZero". 2. Eliminate negation (The compiler was already able to handle these opt if the 0.0s are replaced with -0.0.) rule 3: (0.0 - X) * (0.0 - Y) => X * Y rule 4: (0.0 - X) * C => X * -C if the expr is flagged "noSignedZero". 3. Rule 5: (X*Y) * X => (X*X) * Y if X!=Y and the expression is flagged with "UnsafeAlgebra". The purpose of this transformation is two-fold: a) to form a power expression (of X). b) potentially shorten the critical path: After transformation, the latency of the instruction Y is amortized by the expression of X*X, and therefore Y is in a "less critical" position compared to what it was before the transformation. 4. Remove the InstCombine code about simplifiying "X * select". The reasons are following: a) The "select" is somewhat architecture-dependent, therefore the higher level optimizers are not able to precisely predict if the simplification really yields any performance improvement or not. b) The "select" operator is bit complicate, and tends to obscure optimization opportunities. It is btter to keep it as low as possible in expr tree, and let CodeGen to tackle the optimization. llvm-svn: 172551
* Remove trailing spaces.Jakub Staszak2013-01-142-40/+40
| | | | llvm-svn: 172489
* This change is to implement following rules under the condition C_A and/or C_RShuxin Yang2013-01-141-8/+127
| | | | | | | | | | | | | | | | | | | | | --------------------------------------------------------------------------- C_A: reassociation is allowed C_R: reciprocal of a constant C is appropriate, which means - 1/C is exact, or - reciprocal is allowed and 1/C is neither a special value nor a denormal. ----------------------------------------------------------------------------- rule1: (X/C1) / C2 => X / (C2*C1) (if C_A) => X * (1/(C2*C1)) (if C_A && C_R) rule 2: X*C1 / C2 => X * (C1/C2) if C_A rule 3: (X/Y)/Z = > X/(Y*Z) (if C_A && at least one of Y and Z is symbolic value) rule 4: Z/(X/Y) = > (Z*Y)/X (similar to rule3) rule 5: C1/(X*C2) => (C1/C2) / X (if C_A) rule 6: C1/(X/C2) => (C1*C2) / X (if C_A) rule 7: C1/(C2/X) => (C1/C2) * X (if C_A) llvm-svn: 172488
* Fix Casting BugDavid Greene2013-01-141-1/+3
| | | | | | Add a const version of getFpValPtr to avoid a cast-away-const warning. llvm-svn: 172467
OpenPOWER on IntegriCloud