summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* DAGCombiner: Simplify code a bit, make more transforms work with vectors.Benjamin Kramer2014-04-261-58/+37
| | | | llvm-svn: 207338
* Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.Craig Topper2014-04-261-40/+24
| | | | llvm-svn: 207327
* Rip out X86-specific vector SDIV lowering, make the corresponding ↵Benjamin Kramer2014-04-261-13/+24
| | | | | | DAGCombiner transform work on vectors. llvm-svn: 207316
* DAGCombiner: Turn divs of vector splats into vectorized multiplications.Benjamin Kramer2014-04-261-5/+37
| | | | | | | | | | | | Otherwise the legalizer would just scalarize everything. Support for mulhi in the targets isn't that great yet so on most targets we get exactly the same scalarized output. Add a test for x86 vector udiv. I had to disable the mulhi nodes on ARM because there aren't any patterns for it. As far as I know ARM has instructions for getting the high part of a multiply so this should be fixed. llvm-svn: 207315
* Fix an infinite loop bug in DAG Combine about keeping transfering between ↵Hao Liu2014-04-221-9/+7
| | | | | | ANY_EXTEND and SIGN_EXTEND. llvm-svn: 206873
* [Modules] Remove potential ODR violations by sinking the DEBUG_TYPEChandler Carruth2014-04-221-1/+2
| | | | | | | | | | | | define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837
* DAGCombiner: don't optimise non-existant litpool loadTim Northover2014-04-161-1/+3
| | | | | | | | | | | This particular DAG combine is designed to kick in when both ConstantFPs will end up being loaded via a litpool, however those nodes have a semi-legal status, dictated by isFPImmLegal so in some cases there wouldn't have been a litpool in the first place. Don't try to be clever in those circumstances. Picked up while merging some AArch64 tests. llvm-svn: 206365
* Revert r191049/r191059 as it can produce wrong code (see PR17975).Robert Lougher2014-04-151-21/+0
| | | | | | It has already been reverted on the 3.4 branch in r196521. llvm-svn: 206311
* Break PseudoSourceValue out of the Value hierarchy. It is now the root of ↵Nick Lewycky2014-04-151-110/+39
| | | | | | its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead. llvm-svn: 206255
* [C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper2014-04-141-59/+58
| | | | | | instead of comparing to nullptr. llvm-svn: 206142
* Reenable use of TBAA during CodeGenHal Finkel2014-04-121-7/+1
| | | | | | | | | | | | | | | | | | | | We had disabled use of TBAA during CodeGen (even when otherwise using AA) because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to miss basic type punning that it should catch (and, thus, we'd fail to override TBAA when we should). However, when AA is in use during CodeGen, CGP now uses normal GEPs and bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a result, BasicAA should be able to make us do the right thing in the face of type-punning, and it seems safe to enable use of TBAA again. self-hosting seems fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle. Note: We still don't update TBAA when merging stack slots, although because BasicAA should now catch all such cases, this is no longer a blocking issue. Nevertheless, I plan to commit code to deal with this properly in the near future. llvm-svn: 206093
* [c++11] Range'ify use list loops in DAGCombiner.Jim Grosbach2014-04-111-18/+7
| | | | llvm-svn: 206014
* [DAGCombiner] DAG combine does not know how to combine indexed loads withQuentin Colombet2014-04-091-2/+5
| | | | | | | | | | | sign/zero/any extensions. However a few places were not checking properly the property of the load and were turning an indexed load into a regular extended load. Therefore the indexed value was lost during the process and this was triggering an assertion. <rdar://problem/16389332> llvm-svn: 205923
* Bug 19348: Check for legal ExtLoad operation before foldingMatt Arsenault2014-04-081-9/+12
| | | | | | | | (aext (zextload x)) -> (aext (truncate (*extload x))) Patch by Stanislav Mekhanoshin! llvm-svn: 205805
* Make isSetCCEquivalent respect the TargetBooleanContentsMatt Arsenault2014-04-011-19/+22
| | | | llvm-svn: 205336
* Look at shuffles of build_vectors in DAGCombiner::visitEXTRACT_VECTOR_ELTHal Finkel2014-03-311-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the loop vectorizer vectorizes code that uses the loop induction variable, we often end up with IR like this: %b1 = insertelement <2 x i32> undef, i32 %v, i32 0 %b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer %i = add <2 x i32> %b2, <i32 2, i32 3> If the add in this example is not legal (as is the case on PPC with VSX), it will be scalarized, and we'll end up with a number of extract_vector_elt nodes with the vector shuffle as the input operand, and that vector shuffle is fed by one or more build_vector nodes. By the time that vector operations are expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by looking through the vector shuffle (to make sure that no illegal operations are created), and so the extract_vector_elt -> vector shuffle -> build_vector is never simplified to an operand of the build vector. By looking at build_vectors through a shuffle we fix this particular situation, preventing a vector from being built, only to be deconstructed again (for the scalarized add) -- an expensive proposition when this all needs to be done via the stack. We probably want a more comprehensive fix here where we look back recursively through any shuffles to any build_vectors or scalar_to_vectors, etc. but that can come later. llvm-svn: 205179
* [DAG] Fix an assertion failure caused by an invalid cast in method ↵Andrea Di Biagio2014-03-221-1/+1
| | | | | | | | | | | | 'BuildVectorSDNode::isConstantSplat' This patch renames method 'isConstantSplat' as 'getConstantSplatValue' (mainly for consistency reasons), and rewrites its logic to ensure that we always perform a legal 'cast<ConstantSDNode>'. Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts. llvm-svn: 204536
* [DAGCombiner] teach how to simplify xor/and/or nodes according to the ↵Andrea Di Biagio2014-03-181-21/+52
| | | | | | | | | | | | | | following rules: 1) (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask) 2) (OR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR (A, B), C, Mask) 3) (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask) 4) (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask) 5) (OR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR (A, B), Mask) 6) (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask) llvm-svn: 204160
* Make DAGCombiner work on vector bitshifts with constant splat vectors.Matt Arsenault2014-03-171-137/+165
| | | | llvm-svn: 204071
* [C++11] Add 'override' keyword to virtual methods that override their base ↵Craig Topper2014-03-081-1/+1
| | | | | | class. llvm-svn: 203339
* [DAGCombiner] Distribute TRUNC through AND in rotation amountAdam Nemet2014-03-071-0/+16
| | | | | | | | | | | | | | | | This is already done for shifts. Allow it for rotations as well. E.g.: (rotl:i32 x, (trunc (and y, 31))) -> (rotl:i32 x, (and (trunc y), 31)) Use the newly factored-out distributeTruncateThroughAnd. With this patch and some X86.td tweaks we should be able to remove redundant masking of the rotation amount like in the example above. HW implicitly performs this masking. The testcase will be added as part of the X86 patch. llvm-svn: 203316
* [DAGCombiner] Recognize another rotation idiomAdam Nemet2014-03-071-0/+8
| | | | | | | | | | | | | | | | This is the new idiom: x<<(y&31) | x>>((0-y)&31) which is recognized as: x ROTL (y&31) The change refines matchRotateSub. In Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is Pos' & (OpSize - 1) we can just use Pos' instead of Pos. llvm-svn: 203315
* [DAGCombiner] Slightly improve readability of matchRotateSubAdam Nemet2014-03-071-8/+9
| | | | | | | | | | Slightly change the wording in the function comment. Originally, it can be misunderstood as we turned the input into two subsequent rotates. Better connect the comment which talks about Mask and the code which used LoBits. Renamed variable to MaskLoBits. llvm-svn: 203314
* [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.Andrea Di Biagio2014-03-061-0/+54
| | | | | | | | | | | | | | | | | | | | | | This patch teaches the DAGCombiner how to fold a binary OR between two shufflevector into a single shuffle vector when possible. The rules are: 1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1) 2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2) The DAGCombiner can take advantage of the fact that OR is commutative and compute two possible shuffle masks (Mask1 and Mask2) for the resulting shuffle node. Before folding a dag according to either rule 1 or 2, DAGCombiner verifies that the resulting shuffle mask is legal for the target. DAGCombiner would firstly try to fold according to 1.; If not possible then it will try to fold according to 2. If both Mask1 and Mask2 are illegal then we conservatively don't fold the OR instruction. llvm-svn: 203156
* [DAGCombiner] Factor out distributeTruncateThroughAndAdam Nemet2014-03-041-47/+42
| | | | | | | | | Currently this code is duplicated across visitSHL, visitSRA and visitSRL. The plan is to add rotates as clients to this new function. There is no functional change intended here. llvm-svn: 202908
* [C++11] Replace llvm::tie with std::tie.Benjamin Kramer2014-03-021-6/+6
| | | | | | The old implementation is no longer needed in C++11. llvm-svn: 202644
* Now that we have C++11, turn simple functors into lambdas and remove a ton ↵Benjamin Kramer2014-03-011-21/+10
| | | | | | | | of boilerplate. No intended functionality change. llvm-svn: 202588
* Fix visitTRUNCATE for legal i1 valuesHal Finkel2014-02-281-1/+1
| | | | | | | | | | | This extract-and-trunc vector optimization cannot work for i1 values as currently implemented, and so I'm disabling this for now for i1 values. In the future, this can be fixed properly. Soon I'll commit support for i1 CR bit tracking in the PowerPC backend, and this will be covered by one of the existing regression tests. llvm-svn: 202449
* Trivial code simplificationMatt Arsenault2014-02-241-2/+1
| | | | llvm-svn: 202073
* [DAGCombiner] PCMP* sets its result to all ones or zeros so we can AND with theQuentin Colombet2014-02-211-0/+18
| | | | | | | | | | | | | | shifted mask rather than masking and shifting separately. The patch adds this transformation to the DAGCombiner:   (shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C) <rdar://problem/16054492> Patch by Adam Nemet <anemet@apple.com> llvm-svn: 201906
* Teach the DAGCombiner how to fold concat_vector nodes when the input is twoRobert Lougher2014-02-111-0/+20
| | | | | | | | | | | | | BUILD_VECTOR nodes, e.g.: (concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4)) -> (BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4) This fixes an issue with AVX, where a sequence was not recognized as a 256-bit vbroadcast due to the concat_vectors. llvm-svn: 201158
* [DAG] Don't pull the binary operation though the shift if the operands have ↵Juergen Ributzka2014-02-061-2/+9
| | | | | | | | | | | | opaque constants. During DAGCombine visitShiftByConstant assumes that certain binary operations with only constant operands can always be folded successfully. This is no longer true when the constant is opaque. This commit fixes visitShiftByConstant by not performing the optimization for opaque constants. Otherwise we would end up in an infinite DAGCombine loop. llvm-svn: 200900
* This patch teaches the DAGCombiner how to fold insert_subvector nodesManman Ren2014-01-311-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when the input is a concat_vectors and the insert replaces one of the concat halves: Lower half: fold (insert_subvector (concat_vectors X, Y), Z) -> (concat_vectors Z, Y) Upper half: fold (insert_subvector (concat_vectors X, Y), Z) -> (concat_vectors X, Z) This can be seen with the following IR: define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x float> %v3) { %1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> %2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x float> %1, <4 x float> %v3, i8 0) The vinsertf128 intrinsic is converted into an insert_subvector node in SelectionDAGBuilder.cpp. Using AVX, without the patch this generates two vinsertf128 instructions: vinsertf128 $1, %xmm1, %ymm0, %ymm0 vinsertf128 $0, %xmm2, %ymm0, %ymm0 With the patch this is optimized into: vinsertf128 $1, %xmm1, %ymm2, %ymm0 Patch by Robert Lougher. llvm-svn: 200506
* DAGCombine should not produce ISD::OR nodes after operation legalization if ↵Owen Anderson2014-01-311-2/+4
| | | | | | they're not legal. llvm-svn: 200503
* [DAGCombiner] Avoid introducing an illegal build_vector when folding a ↵Andrea Di Biagio2014-01-281-9/+15
| | | | | | | | | | | | | | | | | sign_extend. Make sure that we don't introduce illegal build_vector dag nodes when trying to fold a sign_extend of a build_vector. This fixes a regression introduced by r200234. Added test CodeGen/X86/fold-vector-sext-crash.ll to verify that llc no longer crashes with an assertion failure due to an illegal build_vector of type MVT::v4i64. Thanks to Ilia Filippov for spotting this regression and for providing a reproducible test case. llvm-svn: 200313
* Fix sext(setcc) -> select_cc using wrong type for setcc.Matt Arsenault2014-01-271-10/+16
| | | | | | | | | | | | | | | | Also update the comment, since it actually produces a select (setcc) instead of select_cc. It was checking and using the setcc result type for the type of the sext, instead of the type of the compared items. In my problem case, the sext was to i32 and was used as the setcc type, but the expected type was i64. No test since I haven't been able to hit the problem with this on any in-tree targets. llvm-svn: 200249
* [DAGCombiner] Teach how to fold sext/aext/zext of constant build vectors.Andrea Di Biagio2014-01-271-9/+64
| | | | | | | | | | | | | This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when the operand in input is a build vector of constants (or UNDEFs). The inability to fold a sext/zext of a constant build_vector was the root cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support. Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a ConstantSDNode. llvm-svn: 200234
* Fix for PR18102.Stepan Dyatkovskiy2014-01-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from mem-ops sequence sorting. Consider, how MergeConsequtiveStores works for next example: store i8 1, a[0] store i8 2, a[1] store i8 3, a[1] ; a[1] again. return ; DAG starts here 1. Method will collect all the 3 stores. 2. It sorts them by distance from the base pointer (farthest with highest index). 3. It takes first consecutive non-overlapping stores and (if possible) replaces them with a single store instruction. The point is, we can't determine here which 'store' instruction would be the second after sorting ('store 2' or 'store 3'). It happens that 'store 3' would be the second, and 'store 2' would be the third. So after merging we have the next result: store i16 (1 | 3 << 8), base ; is a[0] but bit-casted to i16 store i8 2, a[1] So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1]. Fix: In sort routine just also take into account mem-op sequence number. llvm-svn: 200201
* Disable the use of TBAA when using AA in CodeGenHal Finkel2014-01-251-2/+14
| | | | | | | | | | | | | | | | | There are currently two issues, of which I currently know, that prevent TBAA from being correctly usable in CodeGen: 1. Stack coloring does not update TBAA when merging allocas. This is easy enough to fix, but is not the largest problem. 2. CGP inserts ptrtoint/inttoptr pairs when sinking address computations. Because BasicAA does not handle inttoptr, we'll often miss basic type punning idioms that we need to catch so we don't miscompile real-world code (like LLVM). I don't yet have a small test case for this, but this fixes self hosting a non-asserts build of LLVM on PPC64 when using -enable-aa-sched-mi and -misched=shuffle. llvm-svn: 200093
* Add combiner-aa-only-func (debug only)Hal Finkel2014-01-251-0/+22
| | | | | | | | | This option (which is !NDEBUG only) allows restricting the use of alias analysis in DAGCombiner to a specific function. This has proved extremely valuable to isolating bugs related to this feature, and mirrors the misched-only-func option provided by the new instruction scheduler. llvm-svn: 200088
* Improve descriptions of combiner-alias-analysis and ↵Hal Finkel2014-01-251-2/+2
| | | | | | combiner-global-alias-analysis llvm-svn: 200087
* Revert "Revert "Add Constant Hoisting Pass" (r200034)"Juergen Ributzka2014-01-251-3/+6
| | | | | | | This reverts commit r200058 and adds the using directive for ARMTargetTransformInfo to silence two g++ overload warnings. llvm-svn: 200062
* Revert "Add Constant Hoisting Pass" (r200034)Hans Wennborg2014-01-251-6/+3
| | | | | | | | | | | | | | | This commit caused -Woverloaded-virtual warnings. The two new TargetTransformInfo::getIntImmCost functions were only added to the superclass, and to the X86 subclass. The other targets were not updated, and the warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was hiding the two new getIntImmCost variants. We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost" to the various subclasses, or turning it off, but I suspect that it's wrong to leave the functions unimplemnted in those targets. The default implementations return TCC_Free, which I don't think is right e.g. for ARM. llvm-svn: 200058
* Add Constant Hoisting PassJuergen Ributzka2014-01-241-3/+6
| | | | | | | | Retry commit r200022 with a fix for the build bot errors. Constant expressions have (unlike instructions) module scope use lists and therefore may have users in different functions. The fix is to simply ignore these out-of-function uses. llvm-svn: 200034
* Fix DAGCombiner::GatherAllAliases to account for non-chain dependenciesHal Finkel2014-01-241-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DAGCombiner::GatherAllAliases, which is only used when AA used is enabled during DAGCombine, had a fundamentally incorrect assumption for which this change compensates. GatherAllAliases, which is used to find aliasing predecessor chain nodes (so that a better chain can be selected for a load or store to enable subsequent optimizations) assumed that walking up the chain would always catch all possibly-aliasing loads and stores. This is not true: To really find all aliases, we also need to search for aliases through the value operand of a store, etc. Consider the following situation: Token1 = ... L1 = load Token1, %52 S1 = store Token1, L1, %51 L2 = load Token1, %52+8 S2 = store Token1, L2, %51+8 Token2 = Token(S1, S2) L3 = load Token2, %53 S3 = store Token2, L3, %52 L4 = load Token2, %53+8 S4 = store Token2, L4, %52+8 If we search for aliases of S3 (which loads address %52), and we look only through the chain, then we'll miss the trivial dependence on L1 (which loads from %52). We then might change all loads and stores to use Token1 as their chain operand, which could result in copying %53 into %52 before copying %52 into %51 (which should happen first). The problem is, however, that searching for such data dependencies can become expensive, and the cost is not directly related to the chain depth. Instead, we'll rule out such configurations by insisting that we've visited all chain users (except for users of the original chain, which is not necessary). When doing this, we need to look through nodes we don't care about (otherwise, things like register copies will interfere with trivial use cases). Unfortunately, I don't have a small test case for this problem. Creating the underlying situation is not hard (a pair of memcpys will do it), but arranging for the default instruction schedule to be incorrect is very fragile. This unbreaks self hosting on PPC64 when using -mllvm -combiner-global-alias-analysis -mllvm -combiner-alias-analysis. llvm-svn: 200033
* Revert "Add Constant Hoisting Pass"Juergen Ributzka2014-01-241-6/+3
| | | | | | This reverts commit r200022 to unbreak the build bots. llvm-svn: 200024
* Restrict FindBetterChain DAG combines to unindexed nodesHal Finkel2014-01-241-2/+2
| | | | | | | | | | | | These transformations obviously won't work for indexed (pre/post-inc) loads and stores. In practice, I'm not sure there is any benefit to enabling them for indexed nodes because other transformations that these might enable likely also won't handle indexed nodes. I don't have an in-tree test case that hits this problem, but an upcoming bug fix will make it much more likely. llvm-svn: 200023
* Add Constant Hoisting PassJuergen Ributzka2014-01-241-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass identifies expensive constants to hoist and coalesces them to better prepare it for SelectionDAG-based code generation. This works around the limitations of the basic-block-at-a-time approach. First it scans all instructions for integer constants and calculates its cost. If the constant can be folded into the instruction (the cost is TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't consider it expensive and leave it alone. This is the default behavior and the default implementation of getIntImmCost will always return TCC_Free. If the cost is more than TCC_BASIC, then the integer constant can't be folded into the instruction and it might be beneficial to hoist the constant. Similar constants are coalesced to reduce register pressure and materialization code. When a constant is hoisted, it is also hidden behind a bitcast to force it to be live-out of the basic block. Otherwise the constant would be just duplicated and each basic block would have its own copy in the SelectionDAG. The SelectionDAG recognizes such constants as opaque and doesn't perform certain transformations on them, which would create a new expensive constant. This optimization is only applied to integer constants in instructions and simple (this means not nested) constant cast experessions. For example: %0 = load i64* inttoptr (i64 big_constant to i64*) Reviewed by Eric llvm-svn: 200022
* Fix known typosAlp Toker2014-01-241-3/+3
| | | | | | | Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018
* AVX512: combining setcc and zext is wrong on AVX512Elena Demikhovsky2014-01-221-1/+4
| | | | | | because vector compare instruction puts result in mask register. llvm-svn: 199798
OpenPOWER on IntegriCloud