bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Add missing natual vector cast.	Asiri Rathnayake	2014-10-01	1	-0/+65
\| \| \| \| \| \| \| \| \|	Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST was introduced in r217159 and r217138. This patch adds a missing cast from v2f32 to v1i64 which is causing some compilation failures. Also added test cases to cover various modimm types and BUILD_VECTORs with i64 elements. llvm-svn: 218751
*	[ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)	Oliver Stannard	2014-10-01	6	-17/+55
\| \| \| \| \| \| \| \| \|	The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be modelled using the same target feature, and all double-precision operations are already disabled by the fp-only-sp target features. llvm-svn: 218747
*	[mips] Fix disassembly of [ls][wd]c[23], cache, and pref ↵	Daniel Sanders	2014-10-01	3	-0/+32
\| \| \| \| \| \| \| \|	Fixes PR21015, and PR20993. Patch by Jun Koi llvm-svn: 218745
*	[mips] For indirect calls we don't need $gp to point to .got. Mips linker	Sasa Stankovic	2014-10-01	2	-4/+15
\| \| \| \| \| \| \| \| \|	doesn't generate lazy binding stub for a function whose address is taken in the program. Differential Revision: http://reviews.llvm.org/D5067 llvm-svn: 218744
*	test: XFAIL the non-darwin gmlt test on darwin	Justin Bogner	2014-10-01	1	-0/+3
\| \| \| \| \| \| \|	r218702 disabled a -gmlt optimization for darwin, but this means the non-darwin test isn't working there anymore. llvm-svn: 218742
*	[x86] Teach the new vector shuffle lowering to be even more aggressive	Chandler Carruth	2014-10-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in exposing the scalar value to the broadcast DAG fragment so that we can catch even reloads and fold them into the broadcast. This is somewhat magical I'm afraid but seems to work. It is also what the old lowering did, and I've switched an old test to run both lowerings demonstrating that we get the same result. Unlike the old code, I'm not lowering f32 or f64 scalars through this path when we only have AVX1. The target patterns include pretty heinous code to re-cast those as shuffles when the scalar happens to not be spilled because AVX1 provides no broadcast mechanism from registers what-so-ever. This is terribly brittle. I'd much rather go through our generic lowering code to get this. If needed, we can add a peephole to get even more opportunities to broadcast-from-spill-slots that are exposed post-RA, but my suspicion is this just doesn't matter that much. llvm-svn: 218734
*	[x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is	Chandler Carruth	2014-10-01	1	-5/+20
\| \| \| \| \| \| \| \| \| \|	the same speed as pshufd but we can fold loads into the pmovzx instructions. This fixes some regressions that came up in the regression test suite for the new vector shuffle lowering. llvm-svn: 218733
*	Implement DW_TAG_subrange_type with DW_AT_count rather than DW_AT_upper_bound	David Blaikie	2014-10-01	3	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows proper disambiguation of unbounded arrays and arrays of zero bound ("struct foo { int x[]; };" and "struct foo { int x[0]; }"). GCC instead produces an upper bound of -1 in the latter situation, but count seems tidier. This way lower_bound is provided if it's not the language default and count is provided if the count is known, otherwise it's omitted. Simple. If someone wants to look at rdar://problem/12566646 and see if this change is acceptable to that bug/fix, that might be helpful (see the empty-and-one-elem-array.ll test case which cites that radar). llvm-svn: 218726
*	[x86] Teach the new vector shuffle lowering about VBROADCAST and	Chandler Carruth	2014-10-01	8	-263/+310
\| \| \| \| \| \| \| \| \| \|	VPBROADCAST. This has the somewhat expected pervasive impact. I don't know why I forgot about this. Everything seems good with lots of significant improvements in the tests. llvm-svn: 218724
*	llvm/test/DebugInfo/X86/gmlt.test: Get rid of %llc_dwarf. It should not be ↵	NAKAMURA Takumi	2014-10-01	1	-2/+1
\| \| \| \| \| \| \| \|	used with -mtriple. Also, remove object-emission. test/DebugInfo/X86 doesn't require it. llvm-svn: 218722
*	[InstCombine] Optimize icmp-select-icmp	Gerolf Hoflehner	2014-10-01	2	-1/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In special cases select instructions can be eliminated by replacing them with a cheaper bitwise operation even when the select result is used outside its home block. The instances implemented are patterns like %x=icmp.eq %y=select %x,%r, null %z=icmp.eq\|neq %y, null br %z,true, false ==> %x=icmp.ne %y=icmp.eq %r,null %z=or %x,%y br %z,true,false The optimization is integrated into the instruction combiner and performed only when all uses of the select result can be replaced by the select operand proper. For this dominator information is used and dominance is now a required analysis pass in the combiner. The optimization itself is iterative. The critical step is to replace the select result with the non-constant select operand. So the select becomes local and the combiner iteratively works out simpler code pattern and eventually eliminates the select. rdar://17853760 llvm-svn: 218721
*	Omit DW_AT_inline under -gmlt to save a little more space.	David Blaikie	2014-09-30	1	-1/+0
\| \| \| \|	llvm-svn: 218719
*	[BasicAA] Make better use of zext and sign information	Hal Finkel	2014-09-30	2	-0/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 218714
*	[SimplifyCFG] threshold for folding branches with common destination	Jingyue Wu	2014-09-30	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int input, int output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i \| c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i \| c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i \| c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i \| c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 llvm-svn: 218711
*	[x86] Add AVX1 and AVX2 testing to all of the 128-bit shuffle test	Chandler Carruth	2014-09-30	4	-375/+855
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	cases. While clearly we don't need the AVX vector width, these ISA extensions often cause us to select different instructions and we should cover them even with the narrow vector width. Also, while here, nuke the stress_test2 contents. There is no reason to try to FileCheck this entire body when it is mostly a test for successfully surviving the code generator. llvm-svn: 218710
*	[x86] Update the exact FileCheck syntax of the 256-bit and 512-bit	Chandler Carruth	2014-09-30	5	-1961/+1962
\| \| \| \| \| \| \| \| \| \| \|	shuffle tests to match that used in the script I posted and now used consistently in 128-bit tests. Nothing interesting changing here, just using the label name as the FileCheck label and a slightly more general comment marker consumption strategy. llvm-svn: 218709
*	Adjust test case addition in r218702 so as not to fail when the X86 target ↵	David Blaikie	2014-09-30	3	-2/+5
\| \| \| \| \| \|	isn't built. llvm-svn: 218708
*	[x86] Rework all of the 128-bit vector shuffle tests with my handy test	Chandler Carruth	2014-09-30	4	-1222/+2541
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	updating script so that they are more thorough and consistent. Specific fixes here include: - Actually test VEX-encoded AVX mnemonics. - Actually use an SSE 4.1 run to test SSE 4.1 features! - Correctly check instructions sequences from the start of the function. - Elide the shuffle operands and comment designator in a consistent way. - Test all of the architectures instead of just the ones I was motivated to manually author. I've gone back through and fixed up any egregious issues I spotted. Let me know if I missed something you really dislike. One downside to this is that we're now not as diligently using FileCheck variables for registers. I would be much more concerned with this if we had larger register usage, but there just aren't that interesting of register choices here and most of the registers are constrained by the ABI. Ultimately, I don't think this is likely to be the maintenance burden for these tests and updating them again should be staright forward. llvm-svn: 218707
*	Disable the -gmlt optimization implemented in r218129 under Darwin due to ↵	David Blaikie	2014-09-30	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	issues with dsymutil. r218129 omits DW_TAG_subprograms which have no inlined subroutines when emitting -gmlt data. This makes -gmlt very low cost for -O0 builds. Darwin's dsymutil reasonably considers a CU empty if it has no subprograms (which occurs with the above optimization in -O0 programs without any force_inline function calls) and drops the line table, CU, and everything in this situation, making backtraces impossible. Until dsymutil is modified to account for this, disable this optimization on Darwin to preserve the desired functionality. (see r218545, which should be reverted after this patch, for other discussion/details) Footnote: In the long term, it doesn't look like this scheme (of simplified debug info to describe inlining to enable backtracing) is tenable, it is far too size inefficient for optimized code (the DW_TAG_inlined_subprograms, even once compressed, are nearly twice as large as the line table itself (also compressed)) and we'll be considering things like Cary's two level line table proposal to encode all this information directly in the line table. llvm-svn: 218702
*	Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.	Juergen Ributzka	2014-09-30	1	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note: This version fixed an issue with the TBZ/TBNZ instructions that were generated in FastISel. The issue was that the 64bit version of TBZ (TBZX) automagically sets the upper bit of the immediate field that is used to specify the bit we want to test. To test for any of the lower 32bits we have to first extract the subregister and use the 32bit version of the TBZ instruction (TBZW). Original commit message: Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). llvm-svn: 218693
*	R600/SI: Fix printing of clamp and omod	Matt Arsenault	2014-09-30	5	-15/+15
\| \| \| \| \| \| \| \|	No tests for omod since nothing uses it yet, but this should get rid of the remaining annoying trailing zeros after some instructions. llvm-svn: 218692
*	Extend C disassembler API to allow specifying target features	Bradley Smith	2014-09-30	1	-6/+20
\| \| \| \|	llvm-svn: 218682
*	Add numeric extend, trunctate to mips fast-isel	Reed Kotler	2014-09-30	2	-0/+200
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add numeric extend, trunctate to mips fast-isel Reactivates D4827 Test Plan: fpext.ll loadstoreconv.ll Reviewers: dsanders Subscribers: mcrosier Differential Revision: http://reviews.llvm.org/D5251 llvm-svn: 218681
*	Revert r218673 'llvm-cov: add test for report's function & file association.'	Alex Lorenz	2014-09-30	4	-32/+0
\| \| \| \| \| \|	Test causes buildbot failures. llvm-svn: 218676
*	llvm-cov: add test for report's function & file association.	Alex Lorenz	2014-09-30	4	-0/+32
\| \| \| \| \| \| \| \|	This commit adds a test which checks that the functions defined in header files will get associated with the header files rather than the source files in the reports. Differential Revision: http://reviews.llvm.org/D5489 llvm-svn: 218673
*	llvm-cov: Use the number of executed functions for the function coverage metric.	Alex Lorenz	2014-09-30	3	-0/+24
\| \| \| \| \| \| \| \|	This commit fixes llvm-cov's function coverage metric by using the number of executed functions instead of the number of fully covered functions. Differential Revision: http://reviews.llvm.org/D5196 llvm-svn: 218672
*	Introduce support for custom wrappers for vararg functions.	Lorenzo Martignoni	2014-09-30	1	-4/+10
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5412 llvm-svn: 218671
*	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.	Robert Khasanov	2014-09-30	4	-0/+191
\| \| \| \| \| \|	Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 218670
*	[AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}	Robert Khasanov	2014-09-30	2	-0/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) llvm-svn: 218669
*	[AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.	Robert Khasanov	2014-09-30	2	-1/+34
\| \| \| \| \| \|	Added new operand type for intrinsics (IIT_V64) llvm-svn: 218668
*	[AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.	Robert Khasanov	2014-09-30	1	-1/+33
\| \| \| \| \| \|	Added CMP_MASK intrinsic type llvm-svn: 218667
*	[IndVarSimplify] Widen loop unsigned compares.	Chad Rosier	2014-09-30	1	-0/+28
\| \| \| \| \| \| \|	This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 llvm-svn: 218659
*	[x86] Revert r218588, r218589, and r218600. These patches were pursuing	Chandler Carruth	2014-09-30	5	-20/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a flawed direction and causing miscompiles. Read on for details. Fundamentally, the premise of this patch series was to map VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because we are going to have to lower to VSELECT nodes for some blends to trigger the instruction selection patterns of variable blend instructions. This doesn't actually work out so well. In order to match performance with the existing VECTOR_SHUFFLE lowering code, we would need to re-slice the blend in order to fit it into either the integer or floating point blends available on the ISA. When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources) this works well because the X86 backend ensures that these types of operands to VSELECT get sign extended into '-1' and '0' for true and false, allowing us to re-slice the bits in whatever granularity without changing semantics. However, if the VSELECT condition comes from some other source, for example code lowering vector comparisons, it will likely only have the required bit set -- the high bit. We can't blindly slice up this style of VSELECT. Reid found some code using Halide that triggers this and I'm hopeful to eventually get a test case, but I don't need it to understand why this is A Bad Idea. There is another aspect that makes this approach flawed. When in VECTOR_SHUFFLE form, we have very distilled information that represents the constant blend mask. Converting back to a VSELECT form actually can lose this information, and so I think now that it is better to treat this as VECTOR_SHUFFLE until the very last moment and only use VSELECT nodes for instruction selection purposes. My plan is to: 1) Clean up and formalize the target pre-legalization DAG combine that converts a VSELECT with a constant condition operand into a VECTOR_SHUFFLE. 2) Remove any fancy lowering from VSELECT during legalization relying entirely on the DAG combine to catch cases where we can match to an immediate-controlled blend instruction. One additional step that I'm not planning on but would be interested in others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV which encodes a fully legalized VSELECT node. Then it would be easy to write isel patterns only in terms of this to ensure VECTOR_SHUFFLE legalization only ever forms the fully legalized construct and we can't cycle between it and VSELECT combining. llvm-svn: 218658
*	[x86] Add some vector-register broadcast operations to the 256-bit v4	Chandler Carruth	2014-09-30	1	-0/+30
\| \| \| \| \| \|	tests which were missing them. llvm-svn: 218657
*	R600: Fix broken check lines, missing scalar case.	Matt Arsenault	2014-09-30	1	-21/+31
\| \| \| \|	llvm-svn: 218655
*	[FastISel][AArch64] Fold sign-/zero-extends into the load instruction.	Juergen Ributzka	2014-09-30	2	-11/+193
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sign-/zero-extension of the loaded value can be performed by the memory instruction for free. If the result of the load has only one use and the use is a sign-/zero-extend, then we emit the proper load instruction. The extend is only a register copy and will be optimized away later on. Other instructions that consume the sign-/zero-extended value are also made aware of this fact, so they don't fold the extend too. This fixes rdar://problem/18495928. llvm-svn: 218653
*	WinCOFFObjectWriter: optimize the string table for common suffices	Hans Wennborg	2014-09-29	1	-11/+16
\| \| \| \| \| \| \| \|	This is a follow-up from r207670 which did the same for ELF. Differential Revision: http://reviews.llvm.org/D5530 llvm-svn: 218636
*	Add soft-float to the key for the subtarget lookup in the TargetMachine	Eric Christopher	2014-09-29	2	-6/+51
\| \| \| \| \| \| \| \| \| \| \|	map, this makes sure that we can compile the same code for two different ABIs (hard and soft float) in the same module. Update one testcase accordingly (and fix some confusing naming) and add a new testcase as well with the ordering swapped which would highlight the problem. llvm-svn: 218632
*	R600/SI: Also fix fsub + fadd a, a to mad combines	Matt Arsenault	2014-09-29	2	-0/+64
\| \| \| \|	llvm-svn: 218609
*	R600/SI: Fix using mad with multiplies by 2	Matt Arsenault	2014-09-29	1	-6/+152
\| \| \| \| \| \| \| \| \|	These turn into fadds, so combine them into the target mad node. fadd (fadd (a, a), b) -> mad 2.0, a, b llvm-svn: 218608
*	[AArch64] Improve cost model to handle sdiv by a pow-of-two.	Chad Rosier	2014-09-29	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \|	This patch improves the target-specific cost model to better handle signed division by a power of two. The immediate result is that this enables the SLP vectorizer to do a better job. http://reviews.llvm.org/D5469 PR20714 llvm-svn: 218607
*	Use a loop to simplify the runtime unrolling prologue.	Kevin Qin	2014-09-29	4	-19/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604
*	[Thumb2] ldrexd and strexd are not defined on v7M	Oliver Stannard	2014-09-29	1	-0/+14
\| \| \| \| \| \| \|	The Thumb2 ldrexd and strexd instructions are not defined for M-class architectures. llvm-svn: 218603
*	[x86] Make the new vector shuffle lowering lower blends as VSELECT	Chandler Carruth	2014-09-29	2	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nodes, and rely exclusively on its logic. This removes a ton of duplication from the blend lowering and centralizes it in one place. One downside is that it requires a bunch of hacks to make this work with the current legalization framework. We have to manually speculate one aspect of legalizing VSELECT nodes to get everything to work nicely because the existing legalization framework isn't actually bottom-up. The other grossness is that we somewhat duplicate the analysis of constant blends. I'm on the fence here. If reviewers thing this would look better with VSELECT when it has constant operands dumping over tho VECTOR_SHUFFLE, we could go that way. But it would be a substantial change because currently all of the actual blend instructions are matched via patterns in the TD files based around VSELECT nodes (despite them not being perfect fits for that). Suggestions welcome, but at least this removes the rampant duplication in the backend. llvm-svn: 218600
*	[x86] Delete a bunch of really bad and totally unnecessary code in the	Chandler Carruth	2014-09-29	3	-23/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X86 target-specific DAG combining that tried to convert VSELECT nodes into VECTOR_SHUFFLE nodes that it "knew" would lower into immediate-controlled blend nodes. Turns out, we have perfectly good lowering of all these VSELECT nodes, and indeed that lowering already knows how to handle lowering through BLENDI to immediate-controlled blend nodes. The code just wasn't getting used much because this thing forced the world to go through the vector shuffle lowering. Yuck. This also exposes that I was too aggressive in avoiding domain crossing in v218588 with that lowering -- when the other option is to expand into two 128-bit vectors, it is worth domain crossing. Restore that behavior now that we have nice tests covering it. The test updates here fall into two camps. One is where previously we ended up with an unsigned encoding of the blend operand and now we get a signed encoding. In most of those places there were elaborate comments explaining exactly what these operands really mean. Rather than that, just switch these tests to use the nicely decoded comments that make it obvious that the final shuffle matches. The other updates are just removing pointless domain crossing by blending integers with PBLENDW rather than BLENDPS. llvm-svn: 218589
*	[x86] Add the dispatch skeleton to the new vector shuffle lowering for	Chandler Carruth	2014-09-29	1	-1247/+469
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AVX-512. There is no interesting logic yet. Everything ends up eventually delegating to the generic code to split the vector and shuffle the halves. Interestingly, that logic does a significantly better job of lowering all of these types than the generic vector expansion code does. Mostly, it lets most of the cases fall back to nice AVX2 code rather than all the way back to SSE code paths. Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next up will be to incrementally add direct support for the basic instruction set to each type (adding tests first). llvm-svn: 218585
*	[x86] Teach the new vector shuffle lowering to fall back on AVX-512	Chandler Carruth	2014-09-28	1	-0/+2217
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	vectors. Someone will need to build the AVX512 lowering, which should follow AVX1 and AVX2 very closely for AVX512F and AVX512BW resp. I've added a dummy test which is a port of the v8f32 and v8i32 tests from AVX and AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this is enough information for someone to implement proper lowering here. If not, I'll be happy to help, but right now the AVX-512 support isn't a priority for me. llvm-svn: 218583
*	[x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2	Chandler Carruth	2014-09-28	2	-35/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lowerings. This was hopelessly broken. First, the x86 backend wants '-1' to be the element value representing true in a boolean vector, and second the operand order for VSELECT is backwards from the actual x86 instructions. To make matters worse, the backend is just using '-1' as the true value to get the high bit to be set. It doesn't actually symbolically map the '-1' to anything. But on x86 this isn't quite how it works: there only the high bit is relevant. As a consequence weird non-'-1' values like 0x80 actually "work" once you flip the operands to be backwards. Anyways, thanks to Hal for helping me sort out what these should be. llvm-svn: 218582
*	[x86] Fix a really silly bug that I introduced fixing another bug in the	Chandler Carruth	2014-09-28	1	-0/+26
\| \| \| \| \| \| \| \| \| \|	new vector shuffle target DAG combines -- it helps to actually test for the value you want rather than just using an integer in a boolean context. Have I mentioned that I loathe implicit conversions recently? :: sigh :: llvm-svn: 218576
*	[x86] Fix yet another bug in the new vector shuffle lowering's handling	Chandler Carruth	2014-09-28	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of widening masks. We can't widen a zeroing mask unless both elements that would be merged are either zeroed or undef. This is the only way to widen a mask if it has a zeroed element. Also clean up the code here by ordering the checks in a more logical way and by using the symoblic values for undef and zero. I'm actually torn on using the symbolic values because the existing code is littered with the assumption that -1 is undef, and moreover that entries '< 0' are the special entries. While that works with the values given to these constants, using the symbolic constants actually makes it a bit more opaque why this is the case. llvm-svn: 218575