summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* trivial fix for PR20314Sanjay Patel2014-07-161-1/+4
| | | | | | Make sure that the AddrInst is an Instruction. llvm-svn: 213197
* [RegisterCoalescer] Moving the RegisterCoalescer subtarget hook onto the ↵Chris Bieneman2014-07-161-2/+1
| | | | | | TargetRegisterInfo instead of the TargetSubtargetInfo. llvm-svn: 213188
* CodeGen: don't form illegail EXTLOAD operations.Tim Northover2014-07-161-4/+2
| | | | | | | | | | | | | | | | | It turns out that in most cases (the main exception being i1-related types) once these operations are formed we cannot separate them and the targets end up having to deal with them whether they want to or not. This is not a good situation, and a more reasonable default can be formed by ackowledging this and having targets leave them as Legal. Only x86 seems to be affected (other targets don't even try marking the operation Expand). Mostly there's no visible change here yet, but it will be useful to have truly expanded EXTLOADS for MVT::f16 softening support. llvm-svn: 213162
* Remove TLI from isInTailCallPosition's arguments. NFC.Juergen Ributzka2014-07-163-5/+5
| | | | | | | There is no need to pass on TLI separately to the function. As Eric pointed out the Target Machine already provides everything we need. llvm-svn: 213108
* Move Post RA Scheduling flag bit into SchedMachineModelSanjay Patel2014-07-151-3/+20
| | | | | | | | | | | | | | | | | | | | | Refactoring; no functional changes intended Removed PostRAScheduler bits from subtargets (X86, ARM). Added PostRAScheduler bit to MCSchedModel class. This bit is set by a CPU's scheduling model (if it exists). Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses. Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!). Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling. Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values. Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. c. PPC overrides the CPU's postRA settings by enabling postRA for everything. d. X86 is the only target that actually has postRA specified via sched model info. Differential Revision: http://reviews.llvm.org/D4217 llvm-svn: 213101
* [RegisterCoalescer] Add new subtarget hook allowing targets to opt-out of ↵Chris Bieneman2014-07-151-0/+17
| | | | | | | | | | coalescing. The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing. This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825. llvm-svn: 213078
* [DAGCombiner] Add more rules to fold shuffles.Andrea Di Biagio2014-07-151-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two new rules to the DAGCombiner: 1. shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2 2. shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2 We only do this if the combined shuffle is legal for the target. Example: ;; define <4 x float> @test(<4 x float> %a, <4 x float> %b) { %1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7> %2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5> ret <4 x i32> %2 } ;; (using llc -mcpu=corei7 -march=x86-64) Before, the x86 backend generated: pshufd $120, %xmm0, %xmm0 shufps $-108, %xmm0, %xmm1 movaps %xmm1, %xmm0 Now the x86 backend generates: movsd %xmm1, %xmm0 llvm-svn: 213069
* [FastISel] Insert patchpoint instruction before the target generated call ↵Juergen Ributzka2014-07-151-1/+2
| | | | | | | | | | instruction. The patchpoint instruction should have been inserted before the target generated call instruction to be inside the ADJSTACKDOWN/ADJSTACKUP call sequence window. llvm-svn: 213034
* [FastISel] Fix patchpoint lowering to set the result register.Juergen Ributzka2014-07-151-5/+6
| | | | | | | | Always update the value map with the result register (if there is one), for the patchpoint instruction we created to replace the target-specific call instruction. llvm-svn: 213033
* [DAGCombiner] Avoid calling method 'isShuffleMaskLegal' on illegal vector types.Andrea Di Biagio2014-07-151-0/+2
| | | | | | | | | | | | | | | | | | This patch fixes a crasher in method 'DAGCombiner::visitOR' due to an invalid call to method 'isShuffleMaskLegal'. On x86, method 'isShuffleMaskLegal' always expects a legal vector value type in input. With this patch, we immediately check if the input OR dag node has a legal vector type; we only try to fold a OR dag node into a single shufflevector if we know that the resulting shuffle will have a legal type. This is to avoid calling method 'isShuffleMaskLegal' on a potentially illegal vector value type. Added a new test-case to file 'CodeGen/X86/combine-or.ll' to verify that DAGCombiner doesn't crash in the attempt to check/combine an OR between shuffles with illegal types. llvm-svn: 213020
* CodeGen: Stick constant pool entries in COMDAT sections for WinCOFFDavid Majnemer2014-07-142-10/+26
| | | | | | | | | | | | | | | | COFF lacks a feature that other object file formats support: mergeable sections. To work around this, MSVC sticks constant pool entries in special COMDAT sections so that each constant is in it's own section. This permits unused constants to be dropped and it also allows duplicate constants in different translation units to get merged together. This fixes PR20262. Differential Revision: http://reviews.llvm.org/D4482 llvm-svn: 213006
* [DAGCombiner] Add more rules to combine shuffle vector dag nodes.Andrea Di Biagio2014-07-141-0/+44
| | | | | | | | | | | | | | | | This patch teaches the DAGCombiner how to fold a pair of shuffles according to rules: 1. shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2) 2. shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3) The new rules would only trigger if the resulting shuffle has legal type and legal mask. Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly folds shuffles on x86 when the resulting mask is legal. Also added some negative cases to verify that we avoid introducing illegal shuffles. llvm-svn: 213001
* CodeGen: Add a getSectionKind method to MachineConstantPoolEntryDavid Majnemer2014-07-142-15/+32
| | | | | | This is just a helper routine, no functionality has changed. llvm-svn: 212993
* Unify the lowering of arguments during SjLj prepare.Bill Wendling2014-07-141-28/+10
| | | | | | | The 'select true, %arg, undef' instruction can be used for both aggregate and non-aggregate arguments. llvm-svn: 212967
* fixed typoSanjay Patel2014-07-141-1/+1
| | | | llvm-svn: 212966
* CodeGen: add missing includeSaleem Abdulrasool2014-07-141-0/+1
| | | | | | | Found during windows unwinding work. This header is indirectly included through a chain leading through Support/Win64EH.h. Explicitly include the header. NFC. llvm-svn: 212955
* Support lowering of empty aggregates.Bill Wendling2014-07-141-11/+11
| | | | | | | | | | | | | This crash was pretty common while compiling Rust for iOS (armv7). Reason - SjLj preparation step was lowering aggregate arguments as ExtractValue + InsertValue. ExtractValue has assertion which checks that there is some data in value, which is not true in case of empty (no fields) structures. Rust uses them quite extensively so this patch uses a 'select true, %val, undef' instruction to lower the argument. Patch by Valerii Hiora. llvm-svn: 212922
* [DAGCombiner] Fix a crash caused by a missing check for legal type when ↵Andrea Di Biagio2014-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | trying to fold shuffles. Verify that DAGCombiner does not crash when trying to fold a pair of shuffles according to rule (added at r212539): (shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2) The DAGCombiner avoids folding shuffles if the resulting shuffle dag node is not legal for the target. That means, the resulting shuffle must have legal type and legal mask. Before, the DAGCombiner only called method 'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according to the above-mentioned rule. However, this caused a crash in the x86 backend since method 'isShuffleMaskLegal' always expects to be called on a legal vector type. llvm-svn: 212915
* Templatify DominanceFrontier.Matt Arsenault2014-07-122-0/+55
| | | | | | Theoretically this should now work for MachineBasicBlocks. llvm-svn: 212885
* Avoid a warning from MSVC on "*/" in this code by inserting a spaceReid Kleckner2014-07-121-1/+1
| | | | llvm-svn: 212862
* [FastISel] Add target-independent patchpoint intrinsic support. WIP.Juergen Ributzka2014-07-111-0/+169
| | | | | | | | | | This implements the target-independent lowering for the patchpoint intrinsic. Targets have to implement the FastLowerCall hook to support this intrinsic. Related to <rdar://problem/17427052> llvm-svn: 212849
* [FastISel] Add basic infrastructure to support a target-independent call ↵Juergen Ributzka2014-07-111-2/+208
| | | | | | | | | | | | | | | lowering hook in FastISel. WIP The infrastructure mimics the call lowering we have already in place for SelectionDAG, but with limitations. For example structure return demotion and non-simple types are not supported (yet). Currently every backend has its own implementation and duplicated code for call lowering. There is also no specified interface that could be called from target-independent code. The target-hook is opt-in and doesn't affect current implementations. llvm-svn: 212848
* [FastISel] Make isInTailCallPosition independent of SelectionDAG.Juergen Ributzka2014-07-112-6/+5
| | | | | | | Break out the arguemnts required from SelectionDAG, so that this function can also be used by FastISel. llvm-svn: 212844
* [FastISel] Breakout intrinsic lowering into a separate function and add a ↵Juergen Ributzka2014-07-111-34/+39
| | | | | | | | | | | target-hook. Create a separate helper function for target-independent intrinsic lowering. Also add an target-hook that allows to directly call into a target-sepcific intrinsic lowering method. Currently the implementation is opt-in and doesn't affect existing target implementations. llvm-svn: 212843
* ARM: Allow __fp16 as a function arg or return type for AArch64Oliver Stannard2014-07-111-1/+1
| | | | | | | ACLE 2.0 allows __fp16 to be used as a function argument or return type. This enables this for AArch64. llvm-svn: 212812
* Revert "Reapply "DebugInfo: Ensure that all debug location scope chains from ↵David Blaikie2014-07-112-8/+4
| | | | | | | | | | | | instructions within a function, lead to the function itself."" This reverts commit r212776. Nope, still seems to be failing on the sanitizer bots... but hey, not the msan self-host anymore, it's failing in asan now. I'll start looking there next. llvm-svn: 212793
* Reapply "DebugInfo: Ensure that all debug location scope chains from ↵David Blaikie2014-07-102-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instructions within a function, lead to the function itself." Committed in r212205 and reverted in r212226 due to msan self-hosting failure, I believe I've got that fixed by r212761 to Clang. Original commit message: "Originally committed in r211723, reverted in r211724 due to failure cases found and fixed (ArgumentPromotion: r211872, Inlining: r212065), committed again in r212085 and reverted again in r212089 after fixing some other cases, such as debug info subprogram lists not keeping track of the function they represent (r212128) and then short-circuiting things like LiveDebugVariables that build LexicalScopes for functions that might not have full debug info. And again, I believe the invariant actually holds for some reasonable amount of code (but I'll keep an eye on the buildbots and see what happens... ). Original commit message: PR20038: DebugInfo: Inlined call sites where the caller has debug info but the call itself has no debug location. This situation does bad things when inlined, so I've fixed Clang not to produce inlinable call sites without locations when the caller has debug info (in the one case where I could find that this occurred). This updates the PR20038 test case to be what clang now produces, and readds the assertion that had to be removed due to this bug. I've also beefed up the debug info verifier to help diagnose these issues in the future, and I hope to add checks to the inliner to just assert-fail if it encounters this situation. If, in the future, we decide we have to cope with this situation, the right thing to do is probably to just remove all the DebugLocs from the inlined instructions." llvm-svn: 212776
* SelectionDAG: Factor FP_TO_SINT lower code out of DAGLegalizerJan Vesely2014-07-102-58/+65
| | | | | | | | | | | Move the code to a helper function to allow calls from TypeLegalizer. No functionality change intended Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> Reviewed-by: Owen Anderson <resistor@mac.com> llvm-svn: 212772
* Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), ↵Matt Arsenault2014-07-101-0/+13
| | | | | | | | (trunc b) combine."" Don't try to convert the select condition type. llvm-svn: 212750
* [DAG] Further improve the logic in DAGCombiner that folds a pair of shuffles ↵Andrea Di Biagio2014-07-101-14/+51
| | | | | | | | | | | | | | | | | | | | | into a single shuffle if the resulting mask is legal. This patch teaches the DAGCombiner how to fold shuffles according to the following new rules: 1. shuffle(shuffle(x, y), undef) -> x 2. shuffle(shuffle(x, y), undef) -> y 3. shuffle(shuffle(x, y), undef) -> shuffle(x, undef) 4. shuffle(shuffle(x, y), undef) -> shuffle(y, undef) The backend avoids to combine shuffles according to rules 3. and 4. if the resulting shuffle does not have a legal mask. This is to avoid introducing illegal shuffles that are potentially expanded into a sub-optimal sequence of target specific dag nodes during vector legalization. Added test case combine-vec-shuffle-2.ll to verify that we correctly triggers the new rules when combining shuffles. llvm-svn: 212748
* [x86,SDAG] Introduce any- and sign-extend-vector-inreg nodes analogousChandler Carruth2014-07-106-9/+113
| | | | | | | | | | | | | | | | | | | | to the zero-extend-vector-inreg node introduced previously for the same purpose: manage the type legalization of widened extend operations, especially to support the experimental widening mode for x86. I'm adding both because sign-extend is expanded in terms of any-extend with shifts to propagate the sign bit. This removes the last fundamental scalarization from vec_cast2.ll (a test case that hit many really bad edge cases for widening legalization), although the trunc tests in that file still appear scalarized because the the shuffle legalization is scalarizing. Funny thing, I've been working on that. Some initial experiments with this and SSE2 scenarios is showing moderately good behavior already for sign extension. Still some work to do on the shuffle combining on X86 before we're generating optimal sequences, but avoiding scalarization is a huge step forward. llvm-svn: 212714
* Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) ↵NAKAMURA Takumi2014-07-101-14/+0
| | | | | | | | combine." This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations. llvm-svn: 212708
* Make it possible for ints/floats to return different values from ↵Daniel Sanders2014-07-1010-50/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getBooleanContents() Summary: On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer comparisons return 0 or 1. Updated the various uses of getBooleanContents. Two simplifications had to be disabled when float and int boolean contents differ: - ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially discoverable (i.e. when the condition of the VSELECT is a SETCC node). - visitVSELECT (select C, 0, 1) -> (xor C, 1). Come to think of it, this one could test for the common case of 'C' being a SETCC too. Preserved existing behaviour for all other targets and updated the affected MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low' variable was counting in the wrong direction because it thought it could simply add the result of the comparison. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4389 llvm-svn: 212697
* [AArch64]Fix an assertion failure in DAG Combiner about concating 2 ↵Hao Liu2014-07-101-4/+18
| | | | | | build_vector. llvm-svn: 212677
* [SDAG] Make the new zext-vector-inreg node default to expand so targetsChandler Carruth2014-07-091-1/+4
| | | | | | | | | | | don't need to set it manually. This is based on feedback from Tom who pointed out that if every target needs to handle this we need to reach out to those maintainers. In fact, it doesn't make sense to duplicate everything when anything other than expand seems unlikely at this stage. llvm-svn: 212661
* Recommit r212203: Don't try to construct debug LexicalScopes hierarchy for ↵David Blaikie2014-07-094-4/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | functions that do not have top level debug information. Reverted by Eric Christopher (Thanks!) in r212203 after Bob Wilson reported LTO issues. Duncan Exon Smith and Aditya Nandakumar helped provide a reduced reproduction, though the failure wasn't too hard to guess, and even easier with the example to confirm. The assertion that the subprogram metadata associated with an llvm::Function matches the scope data referenced by the DbgLocs on the instructions in that function is not valid under LTO. In LTO, a C++ inline function might exist in multiple CUs and the subprogram metadata nodes will refer to the same llvm::Function. In this case, depending on the order of the CUs, the first intance of the subprogram metadata may not be the one referenced by the instructions in that function and the assertion will fail. A test case (test/DebugInfo/cross-cu-linkonce-distinct.ll) is added, the assertion removed and a comment added to explain this situation. Original commit message: If a function isn't actually in a CU's subprogram list in the debug info metadata, ignore all the DebugLocs and don't try to build scopes, track variables, etc. While this is possibly a minor optimization, it's also a correctness fix for an incoming patch that will add assertions to LexicalScopes and the debug info verifier to ensure that all scope chains lead to debug info for the current function. Fix up a few test cases that had broken/incomplete debug info that could violate this constraint. Add a test case where this occurs by design (inlining a debug-info-having function in an attribute nodebug function - we want this to work because /if/ the nodebug function is then inlined into a debug-info-having function, it should be fine (and will work fine - we just stitch the scopes up as usual), but should the inlining not happen we need to not assert fail either). llvm-svn: 212649
* Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.Matt Arsenault2014-07-091-0/+14
| | | | | | Do this if the truncate is free and the select is legal. llvm-svn: 212640
* [x86] Fix a bug in my new zext-vector-inreg DAG trickery where we wereChandler Carruth2014-07-092-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | not widening the input type to the node sufficiently to let the ext take place in a register. This would in turn result in a mysterious bitcast assertion failure downstream. First change here is to add back the helpful assert I had in an earlier version of the code to catch this immediately. Next change is to add support to the type legalization to detect when we have widened the operand either too little or too much (for whatever reason) and find a size-matched legal vector type to convert it to first. This can also fail so we get a new fallback path, but that seems OK. With this, we no longer crash on vec_cast2.ll when using widening. I've also added the CHECK lines for the zero-extend cases here. We still need to support sign-extend and trunc (or something) to get plausible code for the other two thirds of this test which is one of the regression tests that showed the most scalarization when widening was force-enabled. Slowly closing in on widening being a viable legalization strategy without it resorting to scalarization at every turn. =] llvm-svn: 212614
* Sink two variables only used in an assert into the assert itself. ShouldChandler Carruth2014-07-091-3/+3
| | | | | | fix the release builds with Werror. llvm-svn: 212612
* [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when wideningChandler Carruth2014-07-095-1/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610
* [SDAG] At the suggestion of Hal, switch to an output parameter thatChandler Carruth2014-07-093-22/+27
| | | | | | | | | | tracks which elements of the build vector are in fact undef. This should make actually inpsecting them (likely in my next patch) reasonably pretty. Also makes the output parameter optional as it is clear now that *most* users are happy with undefs in their splats. llvm-svn: 212581
* [DAG] Teach how to combine a pair of shuffles into a single shuffle if the ↵Andrea Di Biagio2014-07-081-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | resulting mask is legal. This patch teaches how to fold a shuffle according to rule: shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2) We do this only if the resulting mask M2 is legal; this is to avoid introducing illegal shuffles that are potentially expanded into a sub-optimal sequence of target specific dag nodes. This patch has the advantage of being target independent, since it works on ISD nodes. Therefore, all targets (not only x86) can take advantage of this rule. The idea behind this patch is that most shuffle pairs can be safely combined before we run the legalizer on vector operations. This allows us to combine/simplify dag nodes earlier in the process and not only immediately before instruction selection stage. That said. This patch is not meant to replace any existing target specific combine rules; backends might still introduce new shuffles during legalization stage. Also, this rule is very simple and avoids to aggressively optimize shuffles. llvm-svn: 212539
* Fix some Twine locals.Benjamin Kramer2014-07-081-9/+7
| | | | | | Two of those are use after frees. Found by clang-tidy, fixed by me. llvm-svn: 212537
* [x86,SDAG] Sink the logic for folding shuffles of splats moreChandler Carruth2014-07-081-5/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | aggressively from the x86 shuffle lowering to the generic SDAG vector shuffle formation code. This code already tried to fold away shuffles of splats! It just had lots of bugs and couldn't handle the case my new x86 shuffle lowering needed. First, it failed to correctly compute whether N2 was undef because it pre-computed this, then did transformations which could *make* N2 undef, then failed to ever re-consider the precomputed state. Second, it didn't look through bitcasts at all, even in the safe cases where they are just element-type bitcasts with no change to the number of elements. Third, it didn't handle all-zero bit casts nicely the way my code in the x86 side of things did, which is essential to getting good zext-shuffle lowerings. But all of these are generic. I just ported the code down to this layer and fixed the surrounding bugs. Tests exercising this in the x86 backend still pass and some silly code in widen_cast-6.ll gets better. I updated that test to be a bit more precise but it's still pretty unclear what the value of the test is in this day and age. llvm-svn: 212517
* [SDAG] Actually check for a non-constant splat and clarify commentsChandler Carruth2014-07-081-4/+8
| | | | | | | | | | | | | | around the handling of UNDEF lanes in boolean vector content analysis. The code before my changes here also failed to check for non-constant splats in a buildvector. I have no idea how to trigger this, I just spotted by inspection when trying to understand the code. It seems extremely unlikely to be worth the trouble to teach the only caller of this code (DAG combining setcc patterns) how to cleverly handle undef lanes, so I've just commented more thoroughly that we're giving up there. llvm-svn: 212515
* [SDAG] Build up a more rich set of APIs for querying build-vector SDAGChandler Carruth2014-07-083-13/+47
| | | | | | | | | | | | | | | | | | | | nodes about whether they are splats. This is factored out and improved from r212324 which got reverted as it was far too aggressive. The new API should help more conservatively handle buildvectors that are a mixture of splatted and undef values. No functionality change at this point. The hope is to slowly re-introduce the undef-tolerant optimization of splats, but each time being forced to make a concious decision about how to handle the undefs in a way that doesn't lead to contradicting assumptions about the collapsed value. Hal has pointed out in discussions that this may not end up being the desired API and instead it may be more convenient to get a mask of the undef elements or something similar. I'm starting simple and will expand the API as I adapt actual callers and see exactly what they need. llvm-svn: 212514
* [x86] Revert r212324 which was too aggressive w.r.t. allowing undefChandler Carruth2014-07-073-44/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lanes in vector splats. The core problem here is that undef lanes can't *unilaterally* be considered to contribute to splats. Their handling needs to be more cautious. There is also a reported failure of the nightly testers (thanks Tobias!) that may well stem from the same core issue. I'm going to fix this theoretical issue, factor the APIs a bit better, and then verify that I don't see anything bad with Tobias's reduction from the test suite before recommitting. Original commit message for r212324: [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212475
* Make helper functions static.Benjamin Kramer2014-07-071-1/+1
| | | | llvm-svn: 212460
* CodeGen: it turns out that NAND is not the same thing as BIC. At all.Tim Northover2014-07-071-1/+1
| | | | | | | | | | | We've been performing the wrong operation on ARM for "atomicrmw nand" for years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b". This bled over into the generic expansion pass. So I assume no-one has ever actually tried to do an atomic nand in the real world. Oh well. llvm-svn: 212443
* [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work forChandler Carruth2014-07-043-31/+44
| | | | | | | | | | | | | | | | | | | | | | | | any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212324
OpenPOWER on IntegriCloud