summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64]Fix an assertion failure in DAG Combiner about concating 2 ↵Hao Liu2014-07-101-4/+18
| | | | | | build_vector. llvm-svn: 212677
* Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.Matt Arsenault2014-07-091-0/+14
| | | | | | Do this if the truncate is free and the select is legal. llvm-svn: 212640
* [x86] Fix a bug in my new zext-vector-inreg DAG trickery where we wereChandler Carruth2014-07-092-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | not widening the input type to the node sufficiently to let the ext take place in a register. This would in turn result in a mysterious bitcast assertion failure downstream. First change here is to add back the helpful assert I had in an earlier version of the code to catch this immediately. Next change is to add support to the type legalization to detect when we have widened the operand either too little or too much (for whatever reason) and find a size-matched legal vector type to convert it to first. This can also fail so we get a new fallback path, but that seems OK. With this, we no longer crash on vec_cast2.ll when using widening. I've also added the CHECK lines for the zero-extend cases here. We still need to support sign-extend and trunc (or something) to get plausible code for the other two thirds of this test which is one of the regression tests that showed the most scalarization when widening was force-enabled. Slowly closing in on widening being a viable legalization strategy without it resorting to scalarization at every turn. =] llvm-svn: 212614
* Sink two variables only used in an assert into the assert itself. ShouldChandler Carruth2014-07-091-3/+3
| | | | | | fix the release builds with Werror. llvm-svn: 212612
* [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when wideningChandler Carruth2014-07-095-1/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610
* [SDAG] At the suggestion of Hal, switch to an output parameter thatChandler Carruth2014-07-093-22/+27
| | | | | | | | | | tracks which elements of the build vector are in fact undef. This should make actually inpsecting them (likely in my next patch) reasonably pretty. Also makes the output parameter optional as it is clear now that *most* users are happy with undefs in their splats. llvm-svn: 212581
* [DAG] Teach how to combine a pair of shuffles into a single shuffle if the ↵Andrea Di Biagio2014-07-081-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | resulting mask is legal. This patch teaches how to fold a shuffle according to rule: shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2) We do this only if the resulting mask M2 is legal; this is to avoid introducing illegal shuffles that are potentially expanded into a sub-optimal sequence of target specific dag nodes. This patch has the advantage of being target independent, since it works on ISD nodes. Therefore, all targets (not only x86) can take advantage of this rule. The idea behind this patch is that most shuffle pairs can be safely combined before we run the legalizer on vector operations. This allows us to combine/simplify dag nodes earlier in the process and not only immediately before instruction selection stage. That said. This patch is not meant to replace any existing target specific combine rules; backends might still introduce new shuffles during legalization stage. Also, this rule is very simple and avoids to aggressively optimize shuffles. llvm-svn: 212539
* [x86,SDAG] Sink the logic for folding shuffles of splats moreChandler Carruth2014-07-081-5/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | aggressively from the x86 shuffle lowering to the generic SDAG vector shuffle formation code. This code already tried to fold away shuffles of splats! It just had lots of bugs and couldn't handle the case my new x86 shuffle lowering needed. First, it failed to correctly compute whether N2 was undef because it pre-computed this, then did transformations which could *make* N2 undef, then failed to ever re-consider the precomputed state. Second, it didn't look through bitcasts at all, even in the safe cases where they are just element-type bitcasts with no change to the number of elements. Third, it didn't handle all-zero bit casts nicely the way my code in the x86 side of things did, which is essential to getting good zext-shuffle lowerings. But all of these are generic. I just ported the code down to this layer and fixed the surrounding bugs. Tests exercising this in the x86 backend still pass and some silly code in widen_cast-6.ll gets better. I updated that test to be a bit more precise but it's still pretty unclear what the value of the test is in this day and age. llvm-svn: 212517
* [SDAG] Actually check for a non-constant splat and clarify commentsChandler Carruth2014-07-081-4/+8
| | | | | | | | | | | | | | around the handling of UNDEF lanes in boolean vector content analysis. The code before my changes here also failed to check for non-constant splats in a buildvector. I have no idea how to trigger this, I just spotted by inspection when trying to understand the code. It seems extremely unlikely to be worth the trouble to teach the only caller of this code (DAG combining setcc patterns) how to cleverly handle undef lanes, so I've just commented more thoroughly that we're giving up there. llvm-svn: 212515
* [SDAG] Build up a more rich set of APIs for querying build-vector SDAGChandler Carruth2014-07-083-13/+47
| | | | | | | | | | | | | | | | | | | | nodes about whether they are splats. This is factored out and improved from r212324 which got reverted as it was far too aggressive. The new API should help more conservatively handle buildvectors that are a mixture of splatted and undef values. No functionality change at this point. The hope is to slowly re-introduce the undef-tolerant optimization of splats, but each time being forced to make a concious decision about how to handle the undefs in a way that doesn't lead to contradicting assumptions about the collapsed value. Hal has pointed out in discussions that this may not end up being the desired API and instead it may be more convenient to get a mask of the undef elements or something similar. I'm starting simple and will expand the API as I adapt actual callers and see exactly what they need. llvm-svn: 212514
* [x86] Revert r212324 which was too aggressive w.r.t. allowing undefChandler Carruth2014-07-073-44/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lanes in vector splats. The core problem here is that undef lanes can't *unilaterally* be considered to contribute to splats. Their handling needs to be more cautious. There is also a reported failure of the nightly testers (thanks Tobias!) that may well stem from the same core issue. I'm going to fix this theoretical issue, factor the APIs a bit better, and then verify that I don't see anything bad with Tobias's reduction from the test suite before recommitting. Original commit message for r212324: [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212475
* [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work forChandler Carruth2014-07-043-31/+44
| | | | | | | | | | | | | | | | | | | | | | | | any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212324
* Move function dependent resetting of a subtarget variable out of theEric Christopher2014-07-041-0/+1
| | | | | | | | | | subtarget. This involved having the movt predicate take the current function - since we care about size in instruction selection for whether or not to use movw/movt take the function so we can check the attributes. This required adding the current MachineFunction to FastISel and propagating through. llvm-svn: 212309
* Fix ppcf128 component access on little-endian systemsUlrich Weigand2014-07-033-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a pair of two doubles, where one is considered the "high" or more-significant part, and the other is considered the "low" or less-significant part. When a ppcf128 value is stored in memory or a register pair, the high part always comes first, i.e. at the lower memory address or in the lower-numbered register, and the low part always comes second. This is true both on big-endian and little-endian PowerPC systems. (Similar to how with a complex number, the real part always comes first and the imaginary part second, no matter the byte order of the system.) This was implemented incorrectly for little-endian systems in LLVM. This commit fixes three related issues: - When printing an immediate ppcf128 constant to assembler output in emitGlobalConstantFP, emit the high part first on both big- and little-endian systems. - When lowering a ppcf128 type to a pair of f64 types in SelectionDAG (which is used e.g. when generating code to load an argument into a register pair), use correct low/high part ordering on little-endian systems. - In a related issue, because lowering ppcf128 into a pair of f64 must operate differently from lowering an int128 into a pair of i64, bitcasts between ppcf128 and int128 must not be optimized away by the DAG combiner on little-endian systems, but must effect a word-swap. Reviewed by Hal Finkel. llvm-svn: 212274
* [x86] Fix the completely broken vector widening legalization of bswap.Chandler Carruth2014-07-031-1/+1
| | | | | | | | | | | | This operation was classified as a binary operation in the widening logic for some reason (clearly, untested). It is in fact a unary operation. Add a RUN line to a test to exercise this for x86. Note that again the vector widening strategy doesn't regress anything and in one case removes a totally unecessary instruction that we couldn't avoid when promoting the element type. llvm-svn: 212257
* [cleanup] Hoist an if-else chain on ISD opcodes (really designed forChandler Carruth2014-07-021-17/+28
| | | | | | | switches) into a switch, and sink them into a dispatch function that can return the result rather than awkward variable setting with breaks. llvm-svn: 212166
* [cleanup] Remove dead 'break;' statements that I meant to nuke inChandler Carruth2014-07-021-2/+0
| | | | | | | | r212158 but missed. Thanks to Craig for spotting the goof! llvm-svn: 212159
* [cleanup] Hoist the promotion dispatch logic into the promote functionChandler Carruth2014-07-021-23/+22
| | | | | | | so that we can use return to express it more cleanly and avoid so many nested switch statements. llvm-svn: 212158
* [cleanup] Nuke the 'VectorOp' bit of the promote method names.Chandler Carruth2014-07-021-9/+9
| | | | | | | This doesn't add any information for methods in the VectorLegalizer class that clearly take SDAG operations to legalize. llvm-svn: 212157
* [x86] Clean up and modernize the doxygen and API comments for the vectorChandler Carruth2014-07-021-24/+38
| | | | | | operation legalization code. llvm-svn: 212155
* [FastISel] Factor out stackmap intrinsic selection code into a dedicated ↵Juergen Ributzka2014-07-011-73/+76
| | | | | | helper method. NFCI. llvm-svn: 212140
* [DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.Juergen Ributzka2014-07-016-17/+18
| | | | | | | | The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135
* Fix 'platform-specific' hyphenationsAlp Toker2014-06-302-3/+3
| | | | llvm-svn: 212056
* Add ops() method to SDNode that returns an ArrayRef<SDUse>. Use it to ↵Craig Topper2014-06-291-1/+1
| | | | | | simplify some code. llvm-svn: 211993
* [AArch64] Fix memset ICE when memset value is f128.Chad Rosier2014-06-271-1/+1
| | | | llvm-svn: 211960
* Revert "Introduce a string_ostream string builder facilty"Alp Toker2014-06-262-2/+4
| | | | | | Temporarily back out commits r211749, r211752 and r211754. llvm-svn: 211814
* Introduce a string_ostream string builder faciltyAlp Toker2014-06-262-4/+2
| | | | | | | | | | | | | | | | | | | | string_ostream is a safe and efficient string builder that combines opaque stack storage with a built-in ostream interface. small_string_ostream<bytes> additionally permits an explicit stack storage size other than the default 128 bytes to be provided. Beyond that, storage is transferred to the heap. This convenient class can be used in most places an std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair would previously have been used, in order to guarantee consistent access without byte truncation. The patch also converts much of LLVM to use the new facility. These changes include several probable bug fixes for truncated output, a programming error that's no longer possible with the new interface. llvm-svn: 211749
* [AArch64] Fix a build_vector pattern match failKevin Qin2014-06-241-24/+25
| | | | | | caused by defect in isBuildVectorAllZeros(). llvm-svn: 211567
* Legalizer: Add support for splitting insert_subvectors.Benjamin Kramer2014-06-212-0/+39
| | | | | | | | | We handle this by spilling the whole thing to the stack and doing the insertion as a store. PR19492. This happens in real code because the vectorizer creates v2i128 when AVX is enabled. llvm-svn: 211435
* [ValueTracking] Extend range metadata to call/invokeJingyue Wu2014-06-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: With this patch, range metadata can be added to call/invoke including IntrinsicInst. Previously, it could only be added to load. Rename computeKnownBitsLoad to computeKnownBitsFromRangeMetadata because range metadata is not only used by load. Update the language reference to reflect this change. Test Plan: Add several tests in range-2.ll to confirm the verifier is happy with having range metadata on call/invoke. Add two tests in AddOverFlow.ll to confirm annotating range metadata to call/invoke can benefit InstCombine. Reviewers: meheff, nlewycky, reames, hfinkel, eliben Reviewed By: eliben Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4187 llvm-svn: 211281
* DAG: move sret demotion into most basic LowerCallTo implementation.Tim Northover2014-06-181-119/+124
| | | | | | | | | | | | | | | | It looks like there are two versions of LowerCallTo here: the SelectionDAGBuilder one is designed to operate on LLVM IR, and the TargetLowering one in the case where everything is at DAG level. Previously, only the SelectionDAGBuilder variant could handle demoting an impossible return to sret semantics (before delegating to the TargetLowering version), but this functionality is also useful for certain libcalls (e.g. 128-bit operations on 32-bit x86). So this commit moves the sret handling down a level. rdar://problem/17242889 llvm-svn: 211155
* SelectionDAG: Expand i64 = FP_TO_SINT i32Tom Stellard2014-06-171-0/+59
| | | | llvm-svn: 211108
* LegalizeDAG: make sure cast is unsigned before using FP_TO_UINT.Tim Northover2014-06-151-1/+4
| | | | | | | | | | | | It's valid to use FP_TO_SINT when asking for a smaller type (e.g. all "unsigned int16" values fit into a "signed int32"), but the reverse isn't true. Unfortunately, I'm not actually aware of any architecture with asymmetric FP_TO_SINT and FP_TO_UINT handling and the logic happens to work in the symmetric case, so I can't actually write a test for this. llvm-svn: 210986
* The hazard recognizer only needs a subtarget, not a target machineEric Christopher2014-06-132-2/+4
| | | | | | so make it take one. Fix up all users accordingly. llvm-svn: 210948
* IR: add "cmpxchg weak" variant to support permitted failure.Tim Northover2014-06-136-65/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903
* [FastISel][X86] - Add branch weightsJuergen Ributzka2014-06-131-2/+6
| | | | | | | Add branch weights to branch instructions, so that the following passes can optimize based on it (i.e. basic block ordering). llvm-svn: 210863
* [FastISel][X86] Add MachineMemOperand to load/store instructions.Juergen Ributzka2014-06-121-0/+44
| | | | | | | | This commit adds MachineMemOperands to load and store instructions. This allows the peephole optimizer to fold load instructions. Unfortunatelly the peephole optimizer currently doesn't run at -O0. llvm-svn: 210858
* Revert "SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, ↵Tom Stellard2014-06-121-4/+4
| | | | | | | | | y)) for vectors" This reverts commit r210540, adds a testcase for the regression it caused, and marks the R600 test it was supposed to fix as XFAIL. llvm-svn: 210792
* [FastISel] Add support for the stackmap intrinsic.Juergen Ributzka2014-06-121-0/+102
| | | | | | This implements target-independent FastISel lowering for the stackmap intrinsic. llvm-svn: 210742
* Have isInTailCallPosition take the DAG so that we can use theEric Christopher2014-06-101-1/+1
| | | | | | | version of TargetLowering/Machine from there on the way to avoiding TargetMachine in TargetLowering. llvm-svn: 210579
* [FastISel] Collect statistics about failing intrinsic calls.Juergen Ributzka2014-06-101-1/+50
| | | | | | | Add more instruction-specific statistics about failing intrinsic calls during FastISel. llvm-svn: 210556
* SelectionDAG: Don't use MVT::Other to determine legality of ISD::SELECT_CCTom Stellard2014-06-101-16/+5
| | | | | | | | | | | | | The SelectionDAG bad a special case for ISD::SELECT_CC, where it would allow targets to specify: setOperationAction(ISD::SELECT_CC, MVT::Other, Expand); to indicate that they wanted to expand ISD::SELECT_CC for all types. This wasn't applied correctly everywhere, and it makes writing new DAG patterns with ISD::SELECT_CC difficult. llvm-svn: 210541
* SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for ↵Tom Stellard2014-06-101-4/+4
| | | | | | | | | | vectors This prevents a future commit from regressing: test/CodeGen/R600/setcc-equivalent.ll llvm-svn: 210540
* SelectionDAG: Expand SELECT_CC to SELECT + SETCCTom Stellard2014-06-101-1/+17
| | | | | | | | This consolidates code from the Hexagon, R600, and XCore targets. No functionality change intended. llvm-svn: 210539
* [X86] Add target combine rules for horizontal add/sub.Andrea Di Biagio2014-06-091-0/+21
| | | | | | | | | | | | | | | | | | | | This patch adds new target specific combine rules to identify horizontal add/sub idioms from BUILD_VECTOR dag nodes. This patch also teaches the DAGCombiner how to canonicalize sequences of insert_vector_elt dag nodes according to the following rule: (insert_vector_elt (insert_vector_elt A, I0), I1) -> (insert_vecto_elt (insert_vector_elt A, I1), I0) This new canonicalization rule only triggers if the inner insert_vector dag node has exactly one use; also, both indices must be known constants, and I1 < I0. This last rule made it possible to write a simpler algorithm to identify horizontal add/sub patterns because now we don't have to worry about the ordering of insert_vector_elt dag nodes. llvm-svn: 210477
* [DAG] Expose NoSignedWrap, NoUnsignedWrap and Exact flags to SelectionDAG.Andrea Di Biagio2014-06-094-17/+109
| | | | | | | | | | | | | This patch modifies SelectionDAGBuilder to construct SDNodes with associated NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator instructions. Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing nsw/nuw/exact flags during codegen. Patch by Marcello Maggioni. llvm-svn: 210467
* Have TargetSelectionDAGInfo take a DataLayout initializer rather thanEric Christopher2014-06-061-2/+2
| | | | | | a TargetMachine since the only thing it wants is DataLayout. llvm-svn: 210366
* [SelectionDAG] Force cycle detection in AssignTopologicalOrder before abortingAdam Nemet2014-05-311-9/+18
| | | | | | | | | | | | | | | DAG cycle detection is only enabled with ENABLE_EXPENSIVE_CHECKS. However we can run it just before we would crash in order to provide more informative diagnostics. Now in addition to the "Overran sorted position" message we also get the Node printed if a cycle was detected. Tested by building several configs: Debug+Assert, Debug+Assert+Check (this is ENABLE_EXPENSIVE_CHECKS), Release+Assert and Release. Also tried that the AssignTopologicalOrder assert produces the expected results. llvm-svn: 209977
* [SelectionDAG] Pass DAG to checkForCyclesAdam Nemet2014-05-311-10/+12
| | | | | | | | | Pass the DAG down to checkForCycles from all callers where we have it. This allows target-specific nodes to be printed properly. Also print some missing newlines. llvm-svn: 209976
* [X86] Add two combine rules to simplify dag nodes introduced during type ↵Andrea Di Biagio2014-05-301-0/+21
| | | | | | | | | | | | | | | | | | | | | | | legalization when promoting nodes with illegal vector type. This patch teaches the backend how to simplify/canonicalize dag node sequences normally introduced by the backend when promoting certain dag nodes with illegal vector type. This patch adds two new combine rules: 1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) -> (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>) 2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) -> (shuffle (BINOP A, B), Undef, <Mask>). Both rules are only triggered on the type-legalized DAG. In particular, rule 1. is a target specific combine rule that attempts to sink a bitconvert into the operands of a binary operation. Rule 2. is a target independet rule that attempts to move a shuffle immediately after a binary operation. llvm-svn: 209930
OpenPOWER on IntegriCloud