summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [x86] Revert r212324 which was too aggressive w.r.t. allowing undefChandler Carruth2014-07-071-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lanes in vector splats. The core problem here is that undef lanes can't *unilaterally* be considered to contribute to splats. Their handling needs to be more cautious. There is also a reported failure of the nightly testers (thanks Tobias!) that may well stem from the same core issue. I'm going to fix this theoretical issue, factor the APIs a bit better, and then verify that I don't see anything bad with Tobias's reduction from the test suite before recommitting. Original commit message for r212324: [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212475
* [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work forChandler Carruth2014-07-041-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212324
* Fix ppcf128 component access on little-endian systemsUlrich Weigand2014-07-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a pair of two doubles, where one is considered the "high" or more-significant part, and the other is considered the "low" or less-significant part. When a ppcf128 value is stored in memory or a register pair, the high part always comes first, i.e. at the lower memory address or in the lower-numbered register, and the low part always comes second. This is true both on big-endian and little-endian PowerPC systems. (Similar to how with a complex number, the real part always comes first and the imaginary part second, no matter the byte order of the system.) This was implemented incorrectly for little-endian systems in LLVM. This commit fixes three related issues: - When printing an immediate ppcf128 constant to assembler output in emitGlobalConstantFP, emit the high part first on both big- and little-endian systems. - When lowering a ppcf128 type to a pair of f64 types in SelectionDAG (which is used e.g. when generating code to load an argument into a register pair), use correct low/high part ordering on little-endian systems. - In a related issue, because lowering ppcf128 into a pair of f64 must operate differently from lowering an int128 into a pair of i64, bitcasts between ppcf128 and int128 must not be optimized away by the DAG combiner on little-endian systems, but must effect a word-swap. Reviewed by Hal Finkel. llvm-svn: 212274
* Revert "SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, ↵Tom Stellard2014-06-121-4/+4
| | | | | | | | | y)) for vectors" This reverts commit r210540, adds a testcase for the regression it caused, and marks the R600 test it was supposed to fix as XFAIL. llvm-svn: 210792
* SelectionDAG: Don't use MVT::Other to determine legality of ISD::SELECT_CCTom Stellard2014-06-101-16/+5
| | | | | | | | | | | | | The SelectionDAG bad a special case for ISD::SELECT_CC, where it would allow targets to specify: setOperationAction(ISD::SELECT_CC, MVT::Other, Expand); to indicate that they wanted to expand ISD::SELECT_CC for all types. This wasn't applied correctly everywhere, and it makes writing new DAG patterns with ISD::SELECT_CC difficult. llvm-svn: 210541
* SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for ↵Tom Stellard2014-06-101-4/+4
| | | | | | | | | | vectors This prevents a future commit from regressing: test/CodeGen/R600/setcc-equivalent.ll llvm-svn: 210540
* [X86] Add target combine rules for horizontal add/sub.Andrea Di Biagio2014-06-091-0/+21
| | | | | | | | | | | | | | | | | | | | This patch adds new target specific combine rules to identify horizontal add/sub idioms from BUILD_VECTOR dag nodes. This patch also teaches the DAGCombiner how to canonicalize sequences of insert_vector_elt dag nodes according to the following rule: (insert_vector_elt (insert_vector_elt A, I0), I1) -> (insert_vecto_elt (insert_vector_elt A, I1), I0) This new canonicalization rule only triggers if the inner insert_vector dag node has exactly one use; also, both indices must be known constants, and I1 < I0. This last rule made it possible to write a simpler algorithm to identify horizontal add/sub patterns because now we don't have to worry about the ordering of insert_vector_elt dag nodes. llvm-svn: 210477
* [DAG] Expose NoSignedWrap, NoUnsignedWrap and Exact flags to SelectionDAG.Andrea Di Biagio2014-06-091-3/+10
| | | | | | | | | | | | | This patch modifies SelectionDAGBuilder to construct SDNodes with associated NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator instructions. Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing nsw/nuw/exact flags during codegen. Patch by Marcello Maggioni. llvm-svn: 210467
* [X86] Add two combine rules to simplify dag nodes introduced during type ↵Andrea Di Biagio2014-05-301-0/+21
| | | | | | | | | | | | | | | | | | | | | | | legalization when promoting nodes with illegal vector type. This patch teaches the backend how to simplify/canonicalize dag node sequences normally introduced by the backend when promoting certain dag nodes with illegal vector type. This patch adds two new combine rules: 1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) -> (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>) 2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) -> (shuffle (BINOP A, B), Undef, <Mask>). Both rules are only triggered on the type-legalized DAG. In particular, rule 1. is a target specific combine rule that attempts to sink a bitconvert into the operands of a binary operation. Rule 2. is a target independet rule that attempts to move a shuffle immediately after a binary operation. llvm-svn: 209930
* Convert a vselect into a concat_vector if possibleFilipe Cabecinhas2014-05-301-0/+61
| | | | | | | | | | | | | | | | | | | | | Summary: If both vector args to vselect are concat_vectors and the condition is constant and picks half a vector from each argument, convert the vselect into a concat_vectors. Added a test. The ConvertSelectToConcatVector is assuming it doesn't get vselects with arguments of, for example, <undef, undef, true, true>. Those get taken care of in the checks above its call. Reviewers: nadav, delena, grosbach, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3916 llvm-svn: 209929
* [x86] Fold extract_vector_elt of a load into the Load's address computation.Michael J. Spencer2014-05-291-90/+124
| | | | | | | | An address only use of an extract element of a load can be simplified to a load. Without this the result of the extract element is spilled to the stack so that an address is available. llvm-svn: 209788
* Revert "[DAGCombiner] Split up an indexed load if only the base pointer ↵Hal Finkel2014-05-281-30/+7
| | | | | | | | | | value is live" This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux self-hosting. Because nearly every regression test triggers a segfault, I hope this will be easy to fix. llvm-svn: 209747
* Rename ComputeMaskedBits to computeKnownBits. "Masked" has beenJay Foad2014-05-141-8/+8
| | | | | | inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811
* [DAGCombiner] Split up an indexed load if only the base pointer value is liveAdam Nemet2014-05-121-7/+30
| | | | | | | | | | | | | | | | | | | | Right now the load may not get DCE'd because of the side-effect of updating the base pointer. This can happen if we lower a read-modify-write of an illegal larger type (e.g. i48) such that the modification only affects one of the subparts (the lower i32 part but not the higher i16 part). See the testcase. In order to spot the dead load we need to revisit it when SimplifyDemandedBits decided that the value of the load is masked off. This is the CommitTargetLoweringOpt piece. I checked compile time with ARM64 by sending SPEC bitcode files through llc. No measurable change. Fixes <rdar://problem/16031651> llvm-svn: 208640
* DAGCombine: prevent formation of illegal ConstantFP nodes.Tim Northover2014-05-021-5/+10
| | | | llvm-svn: 207850
* [ARM64] Prevent bit extraction to be adjusted by following shiftWeiming Zhao2014-04-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx more difficult. For example: Given %shr = lshr i64 %x, 4 %and = and i64 %shr, 15 %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and %0 = load i64* %arrayidx With current shift folding, it takes 3 instrs to compute base address: lsr x8, x0, #1 and x8, x8, #0x78 add x8, x9, x8 If using ubfx, it only needs 2 instrs: ubfx x8, x0, #4, #4 add x8, x9, x8, lsl #3 This fixes bug 19589 llvm-svn: 207702
* Tidy up whitespace.Jim Grosbach2014-04-291-7/+7
| | | | llvm-svn: 207583
* Convert AddNodeIDNode and SelectionDAG::getNodeIfExiists to use ↵Craig Topper2014-04-271-1/+1
| | | | | | ArrayRef<SDValue> llvm-svn: 207383
* Convert one last signature of getNode to take an ArrayRef of SDUse.Craig Topper2014-04-271-4/+4
| | | | llvm-svn: 207376
* DAGCombiner: Simplify code a bit, make more transforms work with vectors.Benjamin Kramer2014-04-261-58/+37
| | | | llvm-svn: 207338
* Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.Craig Topper2014-04-261-40/+24
| | | | llvm-svn: 207327
* Rip out X86-specific vector SDIV lowering, make the corresponding ↵Benjamin Kramer2014-04-261-13/+24
| | | | | | DAGCombiner transform work on vectors. llvm-svn: 207316
* DAGCombiner: Turn divs of vector splats into vectorized multiplications.Benjamin Kramer2014-04-261-5/+37
| | | | | | | | | | | | Otherwise the legalizer would just scalarize everything. Support for mulhi in the targets isn't that great yet so on most targets we get exactly the same scalarized output. Add a test for x86 vector udiv. I had to disable the mulhi nodes on ARM because there aren't any patterns for it. As far as I know ARM has instructions for getting the high part of a multiply so this should be fixed. llvm-svn: 207315
* Fix an infinite loop bug in DAG Combine about keeping transfering between ↵Hao Liu2014-04-221-9/+7
| | | | | | ANY_EXTEND and SIGN_EXTEND. llvm-svn: 206873
* [Modules] Remove potential ODR violations by sinking the DEBUG_TYPEChandler Carruth2014-04-221-1/+2
| | | | | | | | | | | | define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837
* DAGCombiner: don't optimise non-existant litpool loadTim Northover2014-04-161-1/+3
| | | | | | | | | | | This particular DAG combine is designed to kick in when both ConstantFPs will end up being loaded via a litpool, however those nodes have a semi-legal status, dictated by isFPImmLegal so in some cases there wouldn't have been a litpool in the first place. Don't try to be clever in those circumstances. Picked up while merging some AArch64 tests. llvm-svn: 206365
* Revert r191049/r191059 as it can produce wrong code (see PR17975).Robert Lougher2014-04-151-21/+0
| | | | | | It has already been reverted on the 3.4 branch in r196521. llvm-svn: 206311
* Break PseudoSourceValue out of the Value hierarchy. It is now the root of ↵Nick Lewycky2014-04-151-110/+39
| | | | | | its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead. llvm-svn: 206255
* [C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper2014-04-141-59/+58
| | | | | | instead of comparing to nullptr. llvm-svn: 206142
* Reenable use of TBAA during CodeGenHal Finkel2014-04-121-7/+1
| | | | | | | | | | | | | | | | | | | | We had disabled use of TBAA during CodeGen (even when otherwise using AA) because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to miss basic type punning that it should catch (and, thus, we'd fail to override TBAA when we should). However, when AA is in use during CodeGen, CGP now uses normal GEPs and bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a result, BasicAA should be able to make us do the right thing in the face of type-punning, and it seems safe to enable use of TBAA again. self-hosting seems fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle. Note: We still don't update TBAA when merging stack slots, although because BasicAA should now catch all such cases, this is no longer a blocking issue. Nevertheless, I plan to commit code to deal with this properly in the near future. llvm-svn: 206093
* [c++11] Range'ify use list loops in DAGCombiner.Jim Grosbach2014-04-111-18/+7
| | | | llvm-svn: 206014
* [DAGCombiner] DAG combine does not know how to combine indexed loads withQuentin Colombet2014-04-091-2/+5
| | | | | | | | | | | sign/zero/any extensions. However a few places were not checking properly the property of the load and were turning an indexed load into a regular extended load. Therefore the indexed value was lost during the process and this was triggering an assertion. <rdar://problem/16389332> llvm-svn: 205923
* Bug 19348: Check for legal ExtLoad operation before foldingMatt Arsenault2014-04-081-9/+12
| | | | | | | | (aext (zextload x)) -> (aext (truncate (*extload x))) Patch by Stanislav Mekhanoshin! llvm-svn: 205805
* Make isSetCCEquivalent respect the TargetBooleanContentsMatt Arsenault2014-04-011-19/+22
| | | | llvm-svn: 205336
* Look at shuffles of build_vectors in DAGCombiner::visitEXTRACT_VECTOR_ELTHal Finkel2014-03-311-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the loop vectorizer vectorizes code that uses the loop induction variable, we often end up with IR like this: %b1 = insertelement <2 x i32> undef, i32 %v, i32 0 %b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer %i = add <2 x i32> %b2, <i32 2, i32 3> If the add in this example is not legal (as is the case on PPC with VSX), it will be scalarized, and we'll end up with a number of extract_vector_elt nodes with the vector shuffle as the input operand, and that vector shuffle is fed by one or more build_vector nodes. By the time that vector operations are expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by looking through the vector shuffle (to make sure that no illegal operations are created), and so the extract_vector_elt -> vector shuffle -> build_vector is never simplified to an operand of the build vector. By looking at build_vectors through a shuffle we fix this particular situation, preventing a vector from being built, only to be deconstructed again (for the scalarized add) -- an expensive proposition when this all needs to be done via the stack. We probably want a more comprehensive fix here where we look back recursively through any shuffles to any build_vectors or scalar_to_vectors, etc. but that can come later. llvm-svn: 205179
* [DAG] Fix an assertion failure caused by an invalid cast in method ↵Andrea Di Biagio2014-03-221-1/+1
| | | | | | | | | | | | 'BuildVectorSDNode::isConstantSplat' This patch renames method 'isConstantSplat' as 'getConstantSplatValue' (mainly for consistency reasons), and rewrites its logic to ensure that we always perform a legal 'cast<ConstantSDNode>'. Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts. llvm-svn: 204536
* [DAGCombiner] teach how to simplify xor/and/or nodes according to the ↵Andrea Di Biagio2014-03-181-21/+52
| | | | | | | | | | | | | | following rules: 1) (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask) 2) (OR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR (A, B), C, Mask) 3) (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask) 4) (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask) 5) (OR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR (A, B), Mask) 6) (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask) llvm-svn: 204160
* Make DAGCombiner work on vector bitshifts with constant splat vectors.Matt Arsenault2014-03-171-137/+165
| | | | llvm-svn: 204071
* [C++11] Add 'override' keyword to virtual methods that override their base ↵Craig Topper2014-03-081-1/+1
| | | | | | class. llvm-svn: 203339
* [DAGCombiner] Distribute TRUNC through AND in rotation amountAdam Nemet2014-03-071-0/+16
| | | | | | | | | | | | | | | | This is already done for shifts. Allow it for rotations as well. E.g.: (rotl:i32 x, (trunc (and y, 31))) -> (rotl:i32 x, (and (trunc y), 31)) Use the newly factored-out distributeTruncateThroughAnd. With this patch and some X86.td tweaks we should be able to remove redundant masking of the rotation amount like in the example above. HW implicitly performs this masking. The testcase will be added as part of the X86 patch. llvm-svn: 203316
* [DAGCombiner] Recognize another rotation idiomAdam Nemet2014-03-071-0/+8
| | | | | | | | | | | | | | | | This is the new idiom: x<<(y&31) | x>>((0-y)&31) which is recognized as: x ROTL (y&31) The change refines matchRotateSub. In Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is Pos' & (OpSize - 1) we can just use Pos' instead of Pos. llvm-svn: 203315
* [DAGCombiner] Slightly improve readability of matchRotateSubAdam Nemet2014-03-071-8/+9
| | | | | | | | | | Slightly change the wording in the function comment. Originally, it can be misunderstood as we turned the input into two subsequent rotates. Better connect the comment which talks about Mask and the code which used LoBits. Renamed variable to MaskLoBits. llvm-svn: 203314
* [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.Andrea Di Biagio2014-03-061-0/+54
| | | | | | | | | | | | | | | | | | | | | | This patch teaches the DAGCombiner how to fold a binary OR between two shufflevector into a single shuffle vector when possible. The rules are: 1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1) 2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2) The DAGCombiner can take advantage of the fact that OR is commutative and compute two possible shuffle masks (Mask1 and Mask2) for the resulting shuffle node. Before folding a dag according to either rule 1 or 2, DAGCombiner verifies that the resulting shuffle mask is legal for the target. DAGCombiner would firstly try to fold according to 1.; If not possible then it will try to fold according to 2. If both Mask1 and Mask2 are illegal then we conservatively don't fold the OR instruction. llvm-svn: 203156
* [DAGCombiner] Factor out distributeTruncateThroughAndAdam Nemet2014-03-041-47/+42
| | | | | | | | | Currently this code is duplicated across visitSHL, visitSRA and visitSRL. The plan is to add rotates as clients to this new function. There is no functional change intended here. llvm-svn: 202908
* [C++11] Replace llvm::tie with std::tie.Benjamin Kramer2014-03-021-6/+6
| | | | | | The old implementation is no longer needed in C++11. llvm-svn: 202644
* Now that we have C++11, turn simple functors into lambdas and remove a ton ↵Benjamin Kramer2014-03-011-21/+10
| | | | | | | | of boilerplate. No intended functionality change. llvm-svn: 202588
* Fix visitTRUNCATE for legal i1 valuesHal Finkel2014-02-281-1/+1
| | | | | | | | | | | This extract-and-trunc vector optimization cannot work for i1 values as currently implemented, and so I'm disabling this for now for i1 values. In the future, this can be fixed properly. Soon I'll commit support for i1 CR bit tracking in the PowerPC backend, and this will be covered by one of the existing regression tests. llvm-svn: 202449
* Trivial code simplificationMatt Arsenault2014-02-241-2/+1
| | | | llvm-svn: 202073
* [DAGCombiner] PCMP* sets its result to all ones or zeros so we can AND with theQuentin Colombet2014-02-211-0/+18
| | | | | | | | | | | | | | shifted mask rather than masking and shifting separately. The patch adds this transformation to the DAGCombiner:   (shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C) <rdar://problem/16054492> Patch by Adam Nemet <anemet@apple.com> llvm-svn: 201906
* Teach the DAGCombiner how to fold concat_vector nodes when the input is twoRobert Lougher2014-02-111-0/+20
| | | | | | | | | | | | | BUILD_VECTOR nodes, e.g.: (concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4)) -> (BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4) This fixes an issue with AVX, where a sequence was not recognized as a 256-bit vbroadcast due to the concat_vectors. llvm-svn: 201158
OpenPOWER on IntegriCloud