summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
...
* CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpextTim Northover2014-07-212-4/+17
| | | | | | | | | | | | | | | | | This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc, and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation path is now uniform, regardless of the input IR: fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall fpext -> FP16_TO_FP (if f16 illegal) -> libcall Each target should be able to select the version that best matches its operations and not be required to duplicate patterns for both fptrunc and FP_TO_FP16 (for example). As a result we can remove some redundant AArch64 patterns. llvm-svn: 213507
* [SDAG,cleanup] Switch the DAG combiner over to use the spellingChandler Carruth2014-07-211-179/+179
| | | | | | | | | | | | | | | | | | 'Worklist' consistently rather than a deeply confusing mixture of 'WorkList' and 'Worklist'. Notably, the very 'WorkList' of the DAG combiner was exposed to target specific DAG combines under an interface 'AddToWorklist' which was implemented by in turn calling 'AddToWorkList' in the combiner. This has sent me circling with the wrong case in grep one too many times. I chose to normalize on 'Worklist' because that one won the grep-vote for llvm/lib/... by a hundered hits or so, and it is used in places relatively "canonical" such as InstCombine's Worklist. Let's all jsut pick this casing, whether "correct", "good", or "bad" and be consistent... llvm-svn: 213506
* [SDAG] Rather than using a narrow test against the one dummy node on theChandler Carruth2014-07-211-1/+6
| | | | | | | | | | | | stack, filter all handle nodes from the DAG combiner worklist. This will also handle cases where other handle nodes might be (erroneously) added to the worklist and then cause bugs and explosions when deleted. For example, when running the legalizer within the DAG combiner, there are times when other handle nodes are used and can end up here. llvm-svn: 213505
* [DAGCombiner] Improve the shuffle-vector folding logic.Andrea Di Biagio2014-07-211-0/+22
| | | | | | | | | | | | | | | | Canonicalize shuffles according to rules: * shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A) * shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B) * shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B) This patch helps identifying more shuffle pairs that could be combined reusing the already existing rules in the DAGCombiner. Added new test 'combine-vec-shuffle-5.ll' to verify that the canonicalized shuffles are now folded into a single shuffle node by the DAGCombiner. Added more test cases to 'combine-vec-shuffle-4.ll'. llvm-svn: 213504
* [DAG] Refactor some logic. No functional change.Andrea Di Biagio2014-07-211-0/+21
| | | | | | | | This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'. This refactoring is in preperation of an upcoming change to the DAGCombiner. llvm-svn: 213503
* [C++11] Add predecessors(BasicBlock *) / successors(BasicBlock *) iterator ↵Manuel Jacob2014-07-201-3/+2
| | | | | | | | | | | | | | | | | | ranges. Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended. Test Plan: All tests (make check-all) still pass. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4481 llvm-svn: 213474
* ARM: support legalisation of "fptrunc ... to half" operations.Tim Northover2014-07-182-0/+24
| | | | llvm-svn: 213373
* AArch64: Constant fold converting vector setcc results to float.Jim Grosbach2014-07-181-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the code is generated as: fcmeq.4s v0, v0, v1 movi.4s v1, #0x1 // Integer splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. scvtf.4s v0, v0 // Convert each lane to f32. ret After, the code is improved to: fcmeq.4s v0, v0, v1 fmov.4s v1, #1.00000000 // f32 splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. ret The svvtf.4s has been constant folded away and the floating point 1.0f vector lanes are materialized directly via fmov.4s. Rather than do the folding manually in the target code, teach getNode() in the generic SelectionDAG to handle folding constant operands of vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do additional constant folding there as well, but I don't have test cases for those operations, so leaving them for another time when it becomes appropriate. rdar://17693791 llvm-svn: 213341
* Revert "[x86] Fold extract_vector_elt of a load into the Load's address ↵Michael J. Spencer2014-07-181-124/+90
| | | | | | | | | computation." There's a bug where this can create cycles in the DAG. It will take a bit to fix, so I'm backing it out for now. llvm-svn: 213339
* CodeGen: generate single libcall for fptrunc -> f16 operations.Tim Northover2014-07-173-18/+13
| | | | | | | | | | | | Previously we asserted on this code. Currently compiler-rt doesn't actually implement any of these new libcalls, but external help is pretty much the only viable option for LLVM. I've followed the much more generic "__truncST2" naming, as opposed to the odd name for f32 -> f16 truncation. This can obviously be changed later, or overridden by any targets that need to. llvm-svn: 213252
* CodeGen: extend f16 conversions to permit types > float.Tim Northover2014-07-176-21/+48
| | | | | | | | | | | | | | | | | | | This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248
* [FastISel] Local values shouldn't be alive across an inline asm call with ↵Juergen Ributzka2014-07-161-0/+5
| | | | | | | | | | | | | | side effects. This fixes an issue where a local value is defined before and used after an inline asm call with side effects. This fix simply flushes the local value map, which updates the insertion point for the inline asm call to be above any previously defined local values. This fixes <rdar://problem/17694203> llvm-svn: 213203
* CodeGen: don't form illegail EXTLOAD operations.Tim Northover2014-07-161-4/+2
| | | | | | | | | | | | | | | | | It turns out that in most cases (the main exception being i1-related types) once these operations are formed we cannot separate them and the targets end up having to deal with them whether they want to or not. This is not a good situation, and a more reasonable default can be formed by ackowledging this and having targets leave them as Legal. Only x86 seems to be affected (other targets don't even try marking the operation Expand). Mostly there's no visible change here yet, but it will be useful to have truly expanded EXTLOADS for MVT::f16 softening support. llvm-svn: 213162
* Remove TLI from isInTailCallPosition's arguments. NFC.Juergen Ributzka2014-07-162-2/+2
| | | | | | | There is no need to pass on TLI separately to the function. As Eric pointed out the Target Machine already provides everything we need. llvm-svn: 213108
* [DAGCombiner] Add more rules to fold shuffles.Andrea Di Biagio2014-07-151-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two new rules to the DAGCombiner: 1. shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2 2. shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2 We only do this if the combined shuffle is legal for the target. Example: ;; define <4 x float> @test(<4 x float> %a, <4 x float> %b) { %1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7> %2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5> ret <4 x i32> %2 } ;; (using llc -mcpu=corei7 -march=x86-64) Before, the x86 backend generated: pshufd $120, %xmm0, %xmm0 shufps $-108, %xmm0, %xmm1 movaps %xmm1, %xmm0 Now the x86 backend generates: movsd %xmm1, %xmm0 llvm-svn: 213069
* [FastISel] Insert patchpoint instruction before the target generated call ↵Juergen Ributzka2014-07-151-1/+2
| | | | | | | | | | instruction. The patchpoint instruction should have been inserted before the target generated call instruction to be inside the ADJSTACKDOWN/ADJSTACKUP call sequence window. llvm-svn: 213034
* [FastISel] Fix patchpoint lowering to set the result register.Juergen Ributzka2014-07-151-5/+6
| | | | | | | | Always update the value map with the result register (if there is one), for the patchpoint instruction we created to replace the target-specific call instruction. llvm-svn: 213033
* [DAGCombiner] Avoid calling method 'isShuffleMaskLegal' on illegal vector types.Andrea Di Biagio2014-07-151-0/+2
| | | | | | | | | | | | | | | | | | This patch fixes a crasher in method 'DAGCombiner::visitOR' due to an invalid call to method 'isShuffleMaskLegal'. On x86, method 'isShuffleMaskLegal' always expects a legal vector value type in input. With this patch, we immediately check if the input OR dag node has a legal vector type; we only try to fold a OR dag node into a single shufflevector if we know that the resulting shuffle will have a legal type. This is to avoid calling method 'isShuffleMaskLegal' on a potentially illegal vector value type. Added a new test-case to file 'CodeGen/X86/combine-or.ll' to verify that DAGCombiner doesn't crash in the attempt to check/combine an OR between shuffles with illegal types. llvm-svn: 213020
* [DAGCombiner] Add more rules to combine shuffle vector dag nodes.Andrea Di Biagio2014-07-141-0/+44
| | | | | | | | | | | | | | | | This patch teaches the DAGCombiner how to fold a pair of shuffles according to rules: 1. shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2) 2. shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3) The new rules would only trigger if the resulting shuffle has legal type and legal mask. Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly folds shuffles on x86 when the resulting mask is legal. Also added some negative cases to verify that we avoid introducing illegal shuffles. llvm-svn: 213001
* fixed typoSanjay Patel2014-07-141-1/+1
| | | | llvm-svn: 212966
* [DAGCombiner] Fix a crash caused by a missing check for legal type when ↵Andrea Di Biagio2014-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | trying to fold shuffles. Verify that DAGCombiner does not crash when trying to fold a pair of shuffles according to rule (added at r212539): (shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2) The DAGCombiner avoids folding shuffles if the resulting shuffle dag node is not legal for the target. That means, the resulting shuffle must have legal type and legal mask. Before, the DAGCombiner only called method 'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according to the above-mentioned rule. However, this caused a crash in the x86 backend since method 'isShuffleMaskLegal' always expects to be called on a legal vector type. llvm-svn: 212915
* Avoid a warning from MSVC on "*/" in this code by inserting a spaceReid Kleckner2014-07-121-1/+1
| | | | llvm-svn: 212862
* [FastISel] Add target-independent patchpoint intrinsic support. WIP.Juergen Ributzka2014-07-111-0/+169
| | | | | | | | | | This implements the target-independent lowering for the patchpoint intrinsic. Targets have to implement the FastLowerCall hook to support this intrinsic. Related to <rdar://problem/17427052> llvm-svn: 212849
* [FastISel] Add basic infrastructure to support a target-independent call ↵Juergen Ributzka2014-07-111-2/+208
| | | | | | | | | | | | | | | lowering hook in FastISel. WIP The infrastructure mimics the call lowering we have already in place for SelectionDAG, but with limitations. For example structure return demotion and non-simple types are not supported (yet). Currently every backend has its own implementation and duplicated code for call lowering. There is also no specified interface that could be called from target-independent code. The target-hook is opt-in and doesn't affect current implementations. llvm-svn: 212848
* [FastISel] Make isInTailCallPosition independent of SelectionDAG.Juergen Ributzka2014-07-111-1/+1
| | | | | | | Break out the arguemnts required from SelectionDAG, so that this function can also be used by FastISel. llvm-svn: 212844
* [FastISel] Breakout intrinsic lowering into a separate function and add a ↵Juergen Ributzka2014-07-111-34/+39
| | | | | | | | | | | target-hook. Create a separate helper function for target-independent intrinsic lowering. Also add an target-hook that allows to directly call into a target-sepcific intrinsic lowering method. Currently the implementation is opt-in and doesn't affect existing target implementations. llvm-svn: 212843
* ARM: Allow __fp16 as a function arg or return type for AArch64Oliver Stannard2014-07-111-1/+1
| | | | | | | ACLE 2.0 allows __fp16 to be used as a function argument or return type. This enables this for AArch64. llvm-svn: 212812
* SelectionDAG: Factor FP_TO_SINT lower code out of DAGLegalizerJan Vesely2014-07-102-58/+65
| | | | | | | | | | | Move the code to a helper function to allow calls from TypeLegalizer. No functionality change intended Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> Reviewed-by: Owen Anderson <resistor@mac.com> llvm-svn: 212772
* Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), ↵Matt Arsenault2014-07-101-0/+13
| | | | | | | | (trunc b) combine."" Don't try to convert the select condition type. llvm-svn: 212750
* [DAG] Further improve the logic in DAGCombiner that folds a pair of shuffles ↵Andrea Di Biagio2014-07-101-14/+51
| | | | | | | | | | | | | | | | | | | | | into a single shuffle if the resulting mask is legal. This patch teaches the DAGCombiner how to fold shuffles according to the following new rules: 1. shuffle(shuffle(x, y), undef) -> x 2. shuffle(shuffle(x, y), undef) -> y 3. shuffle(shuffle(x, y), undef) -> shuffle(x, undef) 4. shuffle(shuffle(x, y), undef) -> shuffle(y, undef) The backend avoids to combine shuffles according to rules 3. and 4. if the resulting shuffle does not have a legal mask. This is to avoid introducing illegal shuffles that are potentially expanded into a sub-optimal sequence of target specific dag nodes during vector legalization. Added test case combine-vec-shuffle-2.ll to verify that we correctly triggers the new rules when combining shuffles. llvm-svn: 212748
* [x86,SDAG] Introduce any- and sign-extend-vector-inreg nodes analogousChandler Carruth2014-07-105-9/+109
| | | | | | | | | | | | | | | | | | | | to the zero-extend-vector-inreg node introduced previously for the same purpose: manage the type legalization of widened extend operations, especially to support the experimental widening mode for x86. I'm adding both because sign-extend is expanded in terms of any-extend with shifts to propagate the sign bit. This removes the last fundamental scalarization from vec_cast2.ll (a test case that hit many really bad edge cases for widening legalization), although the trunc tests in that file still appear scalarized because the the shuffle legalization is scalarizing. Funny thing, I've been working on that. Some initial experiments with this and SSE2 scenarios is showing moderately good behavior already for sign extension. Still some work to do on the shuffle combining on X86 before we're generating optimal sequences, but avoiding scalarization is a huge step forward. llvm-svn: 212714
* Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) ↵NAKAMURA Takumi2014-07-101-14/+0
| | | | | | | | combine." This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations. llvm-svn: 212708
* Make it possible for ints/floats to return different values from ↵Daniel Sanders2014-07-109-50/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getBooleanContents() Summary: On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer comparisons return 0 or 1. Updated the various uses of getBooleanContents. Two simplifications had to be disabled when float and int boolean contents differ: - ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially discoverable (i.e. when the condition of the VSELECT is a SETCC node). - visitVSELECT (select C, 0, 1) -> (xor C, 1). Come to think of it, this one could test for the common case of 'C' being a SETCC too. Preserved existing behaviour for all other targets and updated the affected MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low' variable was counting in the wrong direction because it thought it could simply add the result of the comparison. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4389 llvm-svn: 212697
* [AArch64]Fix an assertion failure in DAG Combiner about concating 2 ↵Hao Liu2014-07-101-4/+18
| | | | | | build_vector. llvm-svn: 212677
* Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.Matt Arsenault2014-07-091-0/+14
| | | | | | Do this if the truncate is free and the select is legal. llvm-svn: 212640
* [x86] Fix a bug in my new zext-vector-inreg DAG trickery where we wereChandler Carruth2014-07-092-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | not widening the input type to the node sufficiently to let the ext take place in a register. This would in turn result in a mysterious bitcast assertion failure downstream. First change here is to add back the helpful assert I had in an earlier version of the code to catch this immediately. Next change is to add support to the type legalization to detect when we have widened the operand either too little or too much (for whatever reason) and find a size-matched legal vector type to convert it to first. This can also fail so we get a new fallback path, but that seems OK. With this, we no longer crash on vec_cast2.ll when using widening. I've also added the CHECK lines for the zero-extend cases here. We still need to support sign-extend and trunc (or something) to get plausible code for the other two thirds of this test which is one of the regression tests that showed the most scalarization when widening was force-enabled. Slowly closing in on widening being a viable legalization strategy without it resorting to scalarization at every turn. =] llvm-svn: 212614
* Sink two variables only used in an assert into the assert itself. ShouldChandler Carruth2014-07-091-3/+3
| | | | | | fix the release builds with Werror. llvm-svn: 212612
* [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when wideningChandler Carruth2014-07-095-1/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610
* [SDAG] At the suggestion of Hal, switch to an output parameter thatChandler Carruth2014-07-093-22/+27
| | | | | | | | | | tracks which elements of the build vector are in fact undef. This should make actually inpsecting them (likely in my next patch) reasonably pretty. Also makes the output parameter optional as it is clear now that *most* users are happy with undefs in their splats. llvm-svn: 212581
* [DAG] Teach how to combine a pair of shuffles into a single shuffle if the ↵Andrea Di Biagio2014-07-081-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | resulting mask is legal. This patch teaches how to fold a shuffle according to rule: shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2) We do this only if the resulting mask M2 is legal; this is to avoid introducing illegal shuffles that are potentially expanded into a sub-optimal sequence of target specific dag nodes. This patch has the advantage of being target independent, since it works on ISD nodes. Therefore, all targets (not only x86) can take advantage of this rule. The idea behind this patch is that most shuffle pairs can be safely combined before we run the legalizer on vector operations. This allows us to combine/simplify dag nodes earlier in the process and not only immediately before instruction selection stage. That said. This patch is not meant to replace any existing target specific combine rules; backends might still introduce new shuffles during legalization stage. Also, this rule is very simple and avoids to aggressively optimize shuffles. llvm-svn: 212539
* [x86,SDAG] Sink the logic for folding shuffles of splats moreChandler Carruth2014-07-081-5/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | aggressively from the x86 shuffle lowering to the generic SDAG vector shuffle formation code. This code already tried to fold away shuffles of splats! It just had lots of bugs and couldn't handle the case my new x86 shuffle lowering needed. First, it failed to correctly compute whether N2 was undef because it pre-computed this, then did transformations which could *make* N2 undef, then failed to ever re-consider the precomputed state. Second, it didn't look through bitcasts at all, even in the safe cases where they are just element-type bitcasts with no change to the number of elements. Third, it didn't handle all-zero bit casts nicely the way my code in the x86 side of things did, which is essential to getting good zext-shuffle lowerings. But all of these are generic. I just ported the code down to this layer and fixed the surrounding bugs. Tests exercising this in the x86 backend still pass and some silly code in widen_cast-6.ll gets better. I updated that test to be a bit more precise but it's still pretty unclear what the value of the test is in this day and age. llvm-svn: 212517
* [SDAG] Actually check for a non-constant splat and clarify commentsChandler Carruth2014-07-081-4/+8
| | | | | | | | | | | | | | around the handling of UNDEF lanes in boolean vector content analysis. The code before my changes here also failed to check for non-constant splats in a buildvector. I have no idea how to trigger this, I just spotted by inspection when trying to understand the code. It seems extremely unlikely to be worth the trouble to teach the only caller of this code (DAG combining setcc patterns) how to cleverly handle undef lanes, so I've just commented more thoroughly that we're giving up there. llvm-svn: 212515
* [SDAG] Build up a more rich set of APIs for querying build-vector SDAGChandler Carruth2014-07-083-13/+47
| | | | | | | | | | | | | | | | | | | | nodes about whether they are splats. This is factored out and improved from r212324 which got reverted as it was far too aggressive. The new API should help more conservatively handle buildvectors that are a mixture of splatted and undef values. No functionality change at this point. The hope is to slowly re-introduce the undef-tolerant optimization of splats, but each time being forced to make a concious decision about how to handle the undefs in a way that doesn't lead to contradicting assumptions about the collapsed value. Hal has pointed out in discussions that this may not end up being the desired API and instead it may be more convenient to get a mask of the undef elements or something similar. I'm starting simple and will expand the API as I adapt actual callers and see exactly what they need. llvm-svn: 212514
* [x86] Revert r212324 which was too aggressive w.r.t. allowing undefChandler Carruth2014-07-073-44/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lanes in vector splats. The core problem here is that undef lanes can't *unilaterally* be considered to contribute to splats. Their handling needs to be more cautious. There is also a reported failure of the nightly testers (thanks Tobias!) that may well stem from the same core issue. I'm going to fix this theoretical issue, factor the APIs a bit better, and then verify that I don't see anything bad with Tobias's reduction from the test suite before recommitting. Original commit message for r212324: [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212475
* [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work forChandler Carruth2014-07-043-31/+44
| | | | | | | | | | | | | | | | | | | | | | | | any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212324
* Move function dependent resetting of a subtarget variable out of theEric Christopher2014-07-041-0/+1
| | | | | | | | | | subtarget. This involved having the movt predicate take the current function - since we care about size in instruction selection for whether or not to use movw/movt take the function so we can check the attributes. This required adding the current MachineFunction to FastISel and propagating through. llvm-svn: 212309
* Fix ppcf128 component access on little-endian systemsUlrich Weigand2014-07-033-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a pair of two doubles, where one is considered the "high" or more-significant part, and the other is considered the "low" or less-significant part. When a ppcf128 value is stored in memory or a register pair, the high part always comes first, i.e. at the lower memory address or in the lower-numbered register, and the low part always comes second. This is true both on big-endian and little-endian PowerPC systems. (Similar to how with a complex number, the real part always comes first and the imaginary part second, no matter the byte order of the system.) This was implemented incorrectly for little-endian systems in LLVM. This commit fixes three related issues: - When printing an immediate ppcf128 constant to assembler output in emitGlobalConstantFP, emit the high part first on both big- and little-endian systems. - When lowering a ppcf128 type to a pair of f64 types in SelectionDAG (which is used e.g. when generating code to load an argument into a register pair), use correct low/high part ordering on little-endian systems. - In a related issue, because lowering ppcf128 into a pair of f64 must operate differently from lowering an int128 into a pair of i64, bitcasts between ppcf128 and int128 must not be optimized away by the DAG combiner on little-endian systems, but must effect a word-swap. Reviewed by Hal Finkel. llvm-svn: 212274
* [x86] Fix the completely broken vector widening legalization of bswap.Chandler Carruth2014-07-031-1/+1
| | | | | | | | | | | | This operation was classified as a binary operation in the widening logic for some reason (clearly, untested). It is in fact a unary operation. Add a RUN line to a test to exercise this for x86. Note that again the vector widening strategy doesn't regress anything and in one case removes a totally unecessary instruction that we couldn't avoid when promoting the element type. llvm-svn: 212257
* [cleanup] Hoist an if-else chain on ISD opcodes (really designed forChandler Carruth2014-07-021-17/+28
| | | | | | | switches) into a switch, and sink them into a dispatch function that can return the result rather than awkward variable setting with breaks. llvm-svn: 212166
* [cleanup] Remove dead 'break;' statements that I meant to nuke inChandler Carruth2014-07-021-2/+0
| | | | | | | | r212158 but missed. Thanks to Craig for spotting the goof! llvm-svn: 212159
OpenPOWER on IntegriCloud