bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[x86] Fix a bug in the VZEXT DAG combine that I just made more powerful.	Chandler Carruth	2014-10-04	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \|	It turns out this combine was always somewhat flawed -- there are cases where nested VZEXT nodes can't be combined: if their types have a mismatch that can be observed in the result. While none of these show up in currently, once I switch to the new vector shuffle lowering a few test cases actually form such nested VZEXT nodes. I've not come up with any IR pattern that I can sensible write to exercise this, but it will be covered by tests once I flip the switch. llvm-svn: 219044
*	[x86] Sink a generic combine of VZEXT nodes from the lowering to VZEXT	Chandler Carruth	2014-10-04	1	-40/+39
\| \| \| \| \| \| \| \| \| \| \|	nodes to the DAG combining of them. This will allow the combine to fire on both old vector shuffle lowering and the new vector shuffle lowering and generally seems like a cleaner design. I've trimmed down the code a bit and tried to make it and the surrounding combine fairly clean while moving it around. llvm-svn: 219042
*	[x86] Add a really preposterous number of patterns for matching all of	Chandler Carruth	2014-10-03	2	-5/+194
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't have a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll completely with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033
*	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't	Chandler Carruth	2014-10-03	2	-47/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022
*	[ISel] Keep matching state consistent when folding during X86 address match	Adam Nemet	2014-10-03	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009
*	[x86] Teach the new vector shuffle lowering to aggressively form MOVSS	Chandler Carruth	2014-10-03	2	-5/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are dramatically faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends aren't as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985
*	[x86] Refactor the element insertion logic in the new vector shuffle	Chandler Carruth	2014-10-03	1	-19/+21
\| \| \| \| \| \| \| \| \| \| \|	lowering to handle the potential mirroring of 2-element vectors (because we can't reliably sort them one way) in the caller rather than in the insertion logic. This will simplify things considerably as more ways to fail to match the insertion are added because now we have a nice try and retry point. llvm-svn: 218980
*	[x86] Significantly improve the ability of the new vector shuffle	Chandler Carruth	2014-10-03	1	-26/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977
*	[x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen	Chandler Carruth	2014-10-03	1	-4/+8
\| \| \| \| \| \| \| \| \|	element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. llvm-svn: 218974
*	[AVX512] Pull pattern for subvector insert into the instruction definition	Adam Nemet	2014-10-02	1	-8/+4
\| \| \| \| \| \| \| \| \| \|	No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. llvm-svn: 218928
*	[AVX512] Refactor subvector inserts	Adam Nemet	2014-10-02	1	-102/+55
\| \| \| \| \| \| \| \| \| \|	No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. llvm-svn: 218927
*	[AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm	Adam Nemet	2014-10-02	1	-1/+1
\| \| \| \| \| \| \|	Just like in the case of extracts, the refactoring is uncovering some typos in the code. llvm-svn: 218926
*	[Stackmaps] Make ithe frame-pointer required for stackmaps.	Juergen Ributzka	2014-10-02	1	-1/+2
\| \| \| \| \| \| \| \| \|	Do not eliminate the frame pointer if there is a stackmap or patchpoint in the function. All stackmap references should be FP relative. This fixes PR21107. llvm-svn: 218920
*	[x86] Teach the new vector shuffle lowering to widen floating point	Chandler Carruth	2014-10-02	2	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. llvm-svn: 218911
*	[x86] Improve and correct how the new vector shuffle lowering was	Chandler Carruth	2014-10-01	1	-8/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	matching and lowering 64-bit insertions. The first problem was that we weren't looking through bitcasts to discover that we could lower as insertions. Once fixed, we in turn weren't looking through bitcasts to discover that we could fold a load into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR node around the inserted element and instead were passing a scalar to a DAG node that expected a vector. It turns out there are some patterns that will "lower" this into the correct asm, but the rest of the X86 backend is very unhappy with such antics. This should fix a few more edge case regressions I've spotted going through the regression test suite to enable the new vector shuffle lowering. llvm-svn: 218839
*	Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578	Sanjay Patel	2014-10-01	1	-6/+22
\| \| \| \| \| \| \| \| \| \| \| \|	Negative FABS of either a scalar or vector should be handled the same way on x86 with SSE/AVX: a single OR instruction of the FP operand with a constant to light up the sign bit(s). http://llvm.org/bugs/show_bug.cgi?id=20578 Differential Revision: http://reviews.llvm.org/D5201 llvm-svn: 218822
*	constify TargetMachine parameter for X86TargetLowering.	Eric Christopher	2014-10-01	4	-5/+5
\| \| \| \|	llvm-svn: 218804
*	Don't repeat function/variable name in comment. NFC.	Sanjay Patel	2014-10-01	2	-99/+84
\| \| \| \|	llvm-svn: 218791
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
*	Revert r218778 while investigating buldbot breakage.	Adrian Prantl	2014-10-01	1	-4/+2
\| \| \| \| \| \|	"Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
*	[x86] Fix a few more tiny patterns with the new vector shuffle lowering	Chandler Carruth	2014-10-01	1	-5/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that keep cropping up in the regression test suite. This also addresses one of the issues raised on the mailing list with failing to form 'movsd' in as many cases as we realistically should. There will be corresponding patches forthcoming for v4f32 at least. This was a lot of fuss for a relatively small gain, but all the fuss was on my end trying different ways of holding the pieces of the x86 fragment patterns just right. Now that it works, the code is reasonably simple. In the new test cases I'm adding here, v2i64 sticks out as just plain horrible. I've not come up with any great ideas here other than that it would be nice to recognize when we're going to take a domain crossing hit and cross earlier to get the decent instructions. At least with AVX it is slightly less silly.... llvm-svn: 218756
*	[x86] Delete some extraneous logic from the new vector shuffle lowering.	Chandler Carruth	2014-10-01	1	-7/+0
\| \| \| \| \| \| \| \|	Nothing was relying on this and there are potentially some edge cases that it would not be correct under. Removing it seems better than trying to "fix" it as nothing was relying on it. llvm-svn: 218755
*	Fix typo in comment from r218733	Nick Lewycky	2014-10-01	1	-1/+1
\| \| \| \|	llvm-svn: 218739
*	[x86] Teach the new vector shuffle lowering to be even more aggressive	Chandler Carruth	2014-10-01	1	-17/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in exposing the scalar value to the broadcast DAG fragment so that we can catch even reloads and fold them into the broadcast. This is somewhat magical I'm afraid but seems to work. It is also what the old lowering did, and I've switched an old test to run both lowerings demonstrating that we get the same result. Unlike the old code, I'm not lowering f32 or f64 scalars through this path when we only have AVX1. The target patterns include pretty heinous code to re-cast those as shuffles when the scalar happens to not be spilled because AVX1 provides no broadcast mechanism from registers what-so-ever. This is terribly brittle. I'd much rather go through our generic lowering code to get this. If needed, we can add a peephole to get even more opportunities to broadcast-from-spill-slots that are exposed post-RA, but my suspicion is this just doesn't matter that much. llvm-svn: 218734
*	[x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is	Chandler Carruth	2014-10-01	1	-6/+7
\| \| \| \| \| \| \| \| \| \|	the same speed as pshufd but we can fold loads into the pmovzx instructions. This fixes some regressions that came up in the regression test suite for the new vector shuffle lowering. llvm-svn: 218733
*	[AVX512] Remove space before \t in AsmStrings.	Adam Nemet	2014-10-01	1	-6/+6
\| \| \| \|	llvm-svn: 218725
*	[x86] Teach the new vector shuffle lowering about VBROADCAST and	Chandler Carruth	2014-10-01	2	-0/+127
\| \| \| \| \| \| \| \| \| \|	VPBROADCAST. This has the somewhat expected pervasive impact. I don't know why I forgot about this. Everything seems good with lots of significant improvements in the tests. llvm-svn: 218724
*	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.	Robert Khasanov	2014-09-30	1	-0/+12
\| \| \| \| \| \|	Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 218670
*	[AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}	Robert Khasanov	2014-09-30	3	-7/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) llvm-svn: 218669
*	[AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.	Robert Khasanov	2014-09-30	1	-0/+2
\| \| \| \| \| \|	Added new operand type for intrinsics (IIT_V64) llvm-svn: 218668
*	[AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.	Robert Khasanov	2014-09-30	2	-3/+30
\| \| \| \| \| \|	Added CMP_MASK intrinsic type llvm-svn: 218667
*	[x86] Revert r218588, r218589, and r218600. These patches were pursuing	Chandler Carruth	2014-09-30	1	-149/+261
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a flawed direction and causing miscompiles. Read on for details. Fundamentally, the premise of this patch series was to map VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because we are going to have to lower to VSELECT nodes for some blends to trigger the instruction selection patterns of variable blend instructions. This doesn't actually work out so well. In order to match performance with the existing VECTOR_SHUFFLE lowering code, we would need to re-slice the blend in order to fit it into either the integer or floating point blends available on the ISA. When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources) this works well because the X86 backend ensures that these types of operands to VSELECT get sign extended into '-1' and '0' for true and false, allowing us to re-slice the bits in whatever granularity without changing semantics. However, if the VSELECT condition comes from some other source, for example code lowering vector comparisons, it will likely only have the required bit set -- the high bit. We can't blindly slice up this style of VSELECT. Reid found some code using Halide that triggers this and I'm hopeful to eventually get a test case, but I don't need it to understand why this is A Bad Idea. There is another aspect that makes this approach flawed. When in VECTOR_SHUFFLE form, we have very distilled information that represents the constant blend mask. Converting back to a VSELECT form actually can lose this information, and so I think now that it is better to treat this as VECTOR_SHUFFLE until the very last moment and only use VSELECT nodes for instruction selection purposes. My plan is to: 1) Clean up and formalize the target pre-legalization DAG combine that converts a VSELECT with a constant condition operand into a VECTOR_SHUFFLE. 2) Remove any fancy lowering from VSELECT during legalization relying entirely on the DAG combine to catch cases where we can match to an immediate-controlled blend instruction. One additional step that I'm not planning on but would be interested in others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV which encodes a fully legalized VSELECT node. Then it would be easy to write isel patterns only in terms of this to ensure VECTOR_SHUFFLE legalization only ever forms the fully legalized construct and we can't cycle between it and VSELECT combining. llvm-svn: 218658
*	[AVX512] Use X86VectorVTInfo in the masking helper classes and the FMAs	Adam Nemet	2014-09-29	1	-155/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	No functionality change. Makes the code more compact (see the FMA part). This needs a new type attribute MemOpFrag in X86VectorVTInfo. For now I only defined this in the simple cases. See the commment before the attribute. Diff of X86.td.expanded before and after is empty except for the appearance of the new attribute. llvm-svn: 218637
*	[x86] Make the new vector shuffle lowering lower blends as VSELECT	Chandler Carruth	2014-09-29	1	-196/+149
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nodes, and rely exclusively on its logic. This removes a ton of duplication from the blend lowering and centralizes it in one place. One downside is that it requires a bunch of hacks to make this work with the current legalization framework. We have to manually speculate one aspect of legalizing VSELECT nodes to get everything to work nicely because the existing legalization framework isn't actually bottom-up. The other grossness is that we somewhat duplicate the analysis of constant blends. I'm on the fence here. If reviewers thing this would look better with VSELECT when it has constant operands dumping over tho VECTOR_SHUFFLE, we could go that way. But it would be a substantial change because currently all of the actual blend instructions are matched via patterns in the TD files based around VSELECT nodes (despite them not being perfect fits for that). Suggestions welcome, but at least this removes the rampant duplication in the backend. llvm-svn: 218600
*	[x86] Delete a bunch of really bad and totally unnecessary code in the	Chandler Carruth	2014-09-29	1	-114/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X86 target-specific DAG combining that tried to convert VSELECT nodes into VECTOR_SHUFFLE nodes that it "knew" would lower into immediate-controlled blend nodes. Turns out, we have perfectly good lowering of all these VSELECT nodes, and indeed that lowering already knows how to handle lowering through BLENDI to immediate-controlled blend nodes. The code just wasn't getting used much because this thing forced the world to go through the vector shuffle lowering. Yuck. This also exposes that I was too aggressive in avoiding domain crossing in v218588 with that lowering -- when the other option is to expand into two 128-bit vectors, it is worth domain crossing. Restore that behavior now that we have nice tests covering it. The test updates here fall into two camps. One is where previously we ended up with an unsigned encoding of the blend operand and now we get a signed encoding. In most of those places there were elaborate comments explaining exactly what these operands really mean. Rather than that, just switch these tests to use the nicely decoded comments that make it obvious that the final shuffle matches. The other updates are just removing pointless domain crossing by blending integers with PBLENDW rather than BLENDPS. llvm-svn: 218589
*	[x86] Refactor all of the VSELECT-as-blend lowering code to avoid domain	Chandler Carruth	2014-09-29	1	-18/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	crossing and generally work more like the blend emission code in the new vector shuffle lowering. My goal is to have the new vector shuffle lowering just produce VSELECT nodes that are either matched here to BLENDI or are legal and matched in the .td files to specific blend instructions. That seems much cleaner as there are other ways to produce a VSELECT anyways. =] No observable functionality changed yet, mostly because this code appears to be near-dead. The behavior of this lowering routine did change though. This code being mostly dead and untestable will change with my next commit which will also point some new tests at it. llvm-svn: 218588
*	[x86] Improve naming and comments for VSELECT lowering.	Chandler Carruth	2014-09-29	1	-5/+6
\| \| \| \| \| \|	No functionality changed. llvm-svn: 218586
*	[x86] Add the dispatch skeleton to the new vector shuffle lowering for	Chandler Carruth	2014-09-29	1	-1/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AVX-512. There is no interesting logic yet. Everything ends up eventually delegating to the generic code to split the vector and shuffle the halves. Interestingly, that logic does a significantly better job of lowering all of these types than the generic vector expansion code does. Mostly, it lets most of the cases fall back to nice AVX2 code rather than all the way back to SSE code paths. Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next up will be to incrementally add direct support for the basic instruction set to each type (adding tests first). llvm-svn: 218585
*	[x86] Make the split-and-lower routine fully generic by relaxing the	Chandler Carruth	2014-09-29	1	-18/+18
\| \| \| \| \| \| \| \| \|	assertion, making the name generic, and improving the documentation. Step 1 in adding very primitive support for AVX-512. No functionality changed yet. llvm-svn: 218584
*	[x86] Teach the new vector shuffle lowering to fall back on AVX-512	Chandler Carruth	2014-09-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	vectors. Someone will need to build the AVX512 lowering, which should follow AVX1 and AVX2 very closely for AVX512F and AVX512BW resp. I've added a dummy test which is a port of the v8f32 and v8i32 tests from AVX and AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this is enough information for someone to implement proper lowering here. If not, I'll be happy to help, but right now the AVX-512 support isn't a priority for me. llvm-svn: 218583
*	[x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2	Chandler Carruth	2014-09-28	1	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lowerings. This was hopelessly broken. First, the x86 backend wants '-1' to be the element value representing true in a boolean vector, and second the operand order for VSELECT is backwards from the actual x86 instructions. To make matters worse, the backend is just using '-1' as the true value to get the high bit to be set. It doesn't actually symbolically map the '-1' to anything. But on x86 this isn't quite how it works: there only the high bit is relevant. As a consequence weird non-'-1' values like 0x80 actually "work" once you flip the operands to be backwards. Anyways, thanks to Hal for helping me sort out what these should be. llvm-svn: 218582
*	[x86] Fix a really silly bug that I introduced fixing another bug in the	Chandler Carruth	2014-09-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	new vector shuffle target DAG combines -- it helps to actually test for the value you want rather than just using an integer in a boolean context. Have I mentioned that I loathe implicit conversions recently? :: sigh :: llvm-svn: 218576
*	[x86] Fix yet another bug in the new vector shuffle lowering's handling	Chandler Carruth	2014-09-28	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of widening masks. We can't widen a zeroing mask unless both elements that would be merged are either zeroed or undef. This is the only way to widen a mask if it has a zeroed element. Also clean up the code here by ordering the checks in a more logical way and by using the symoblic values for undef and zero. I'm actually torn on using the symbolic values because the existing code is littered with the assumption that -1 is undef, and moreover that entries '< 0' are the special entries. While that works with the values given to these constants, using the symbolic constants actually makes it a bit more opaque why this is the case. llvm-svn: 218575
*	[x86] Fix yet another issue with widening vector shuffle elements.	Chandler Carruth	2014-09-27	1	-2/+2
\| \| \| \| \| \| \| \|	I spotted this by inspection when debugging something else, so I have no test case what-so-ever, and am not even sure it is possible to realistically trigger the bug. But this is what was intended here. llvm-svn: 218565
*	[x86] Fix terrible bugs everywhere in the new vector shuffle lowering	Chandler Carruth	2014-09-27	1	-23/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and in the target shuffle combining when trying to widen vector elements. Previously only one of these was correct, and we didn't correctly propagate zeroing target shuffle masks (which have a different sentinel value from undef in non- target shuffle masks now). This isn't just a missed optimization, this caused us to drop zeroing shuffles on the floor and miscompile code. The added test case is one example of that. There are other fixes to the test suite as a consequence of this as well as restoring the undef elements in some of the masks that were lost when I brought sanity to the actual value of the undef and zero sentinels. I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW combining code, but that code really needs to go. It was a nice initial attempt, but it isn't very principled and the recursive shuffle combiner is much more powerful. llvm-svn: 218562
*	[x86] Flip the sentinel values used in the target shuffle mask decoding	Chandler Carruth	2014-09-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	to significantly more sane sentinels. Notably, everywhere else in the backend's representation of shuffles uses '-1' to represent undef. The target shuffle masks really shouldn't diverge from that, especially as in a few places they are manipulated by shared code. This causes us to lose some undef lanes in various test masks. I want to get these back, but technically it isn't invalid and there are a lot of bugs here so I want to try to establish a saner baseline for fixing some of the bugs by aligning the specific senitnel values used. llvm-svn: 218561
*	[x86] Fix a moderately terrifying bug in the new 128-bit shuffle logic	Chandler Carruth	2014-09-26	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \|	that managed to elude all of my fuzz testing historically. =/ Something changed to allow this code path to actually be exercised and it was doing bad things. It is especially heavily exercised by the patterns that emerge when doing AVX shuffles that end up lowered through the 128-bit code path. llvm-svn: 218540
*	[x86] The mnemonic is SHUFPS not SHUPFS. =[ I'm very bad at spelling	Chandler Carruth	2014-09-26	1	-3/+3
\| \| \| \| \| \|	sadly. llvm-svn: 218524
*	[x86] In the new vector shuffle lowering, when trying to do another	Chandler Carruth	2014-09-26	1	-10/+11
\| \| \| \| \| \| \| \|	layer of tie-breaking sorting, it really helps to check that you're in a tie first. =] Otherwise the whole thing cycles infinitely. Test case added, another one found through fuzz testing. llvm-svn: 218523