bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][SSE] Add custom execution domain fixing for ↵	Simon Pilgrim	2018-01-15	1	-20/+20
\| \| \| \| \| \| \| \| \| \|	BLENDPD/BLENDPS/PBLENDD/PBLENDW (PR34873) Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW. Differential Revision: https://reviews.llvm.org/D42042 llvm-svn: 322524
*	[DAGCombiner] Generalize (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) ↵	Simon Pilgrim	2017-12-21	1	-2/+2
\| \| \| \| \| \| \| \|	combine to work on non-splat vectors The knownbits_mask_or_shuffle_uitofp change is interesting - shuffle combines manage to kick in, removing the AND constant mask load. For targets with fast-variable-shuffle this should reduce further to VPOR+VPSHUFB+VCVTDQ2PS. llvm-svn: 321279
*	[X86] Add (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) non-splat vector test	Simon Pilgrim	2017-12-21	1	-4/+14
\| \| \| \|	llvm-svn: 321278
*	[CodeGen] Unify MBB reference format in both MIR and debug output	Francis Visoiu Mistrih	2017-12-04	1	-37/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(\1)/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
*	[X86] Teach the execution domain fixing tables to use movlhps inplace of ↵	Craig Topper	2017-09-18	1	-2/+2
\| \| \| \| \| \| \| \|	unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509
*	[X86] Teach execution domain fixing to convert between FP and int unpack ↵	Craig Topper	2017-09-18	1	-3/+3
\| \| \| \| \| \|	instructions. llvm-svn: 313508
*	[DAG] add splat vector support for 'or' in SimplifyDemandedBits	Sanjay Patel	2017-04-19	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \|	I've changed one of the tests to not fold away, but we didn't and still don't do the transform that the comment claims we do (and I don't know why we'd want to do that). Follow-up to: https://reviews.llvm.org/rL300725 https://reviews.llvm.org/rL300763 llvm-svn: 300772
*	[x86] use a single shufps when it can save instructions	Sanjay Patel	2016-12-15	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837
*	[DAG] x \| x --> x	Sanjay Patel	2016-10-30	1	-2/+0
\| \| \| \|	llvm-svn: 285522
*	[x86] add tests for basic logic op folds	Sanjay Patel	2016-10-30	1	-0/+18
\| \| \| \|	llvm-svn: 285520
*	Fix typo in test - it should be masking bits0-15 not bit16	Simon Pilgrim	2016-09-07	1	-1/+1
\| \| \| \|	llvm-svn: 280816
*	[X86][SSE] Added or combine tests for known bits of vectors	Simon Pilgrim	2016-09-07	1	-0/+51
\| \| \| \| \| \|	Part of the yak shaving for D24253 llvm-svn: 280813
*	[CodeGen] Teach OR combine of shuffles involving zero vectors to better ↵	Craig Topper	2016-07-03	1	-6/+2
\| \| \| \| \| \| \| \|	handle undef indices. Undef indices can now be treated as zeros. Or if its undef ORed with zero, we will keep the undef. llvm-svn: 274472
*	[X86] Add tests to show that the DAG combine for OR of shuffles with zero ↵	Craig Topper	2016-07-03	1	-0/+28
\| \| \| \| \| \|	vectors doesn't handle undefs as well as it could. Fix coming in another commit. llvm-svn: 274471
*	[DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero ↵	Craig Topper	2016-06-29	1	-12/+3
\| \| \| \| \| \|	vectors where the zero vector is the first operand to the shuffle instead of the second. llvm-svn: 274097
*	[DAGCombine] Add test cases to show that DAG combining an OR of two shuffles ↵	Craig Topper	2016-06-29	1	-0/+44
\| \| \| \| \| \|	with zero vectors doesn't work if the zero vector is the first operand of the shuffle. Fix coming in a follow up patch. llvm-svn: 274096
*	Make utils/update_llc_test_checks.py note that the assertions are	James Y Knight	2015-11-23	1	-0/+1
\| \| \| \| \| \| \| \| \|	autogenerated. Also update existing test cases which appear to be generated by it and weren't modified (other than addition of the header) by rerunning it. llvm-svn: 253917
*	[DAGCombiner] Remove extra bitcasts surrounding vector shuffles	Simon Pilgrim	2015-04-23	1	-0/+64
\| \| \| \| \| \| \| \|	Patch to remove extra bitcasts from shuffles, this is often a legacy of XformToShuffleWithZero being used to combine bitmaskings (of float vectors bitcast to integer vectors) into shuffles: bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) Differential Revision: http://reviews.llvm.org/D9097 llvm-svn: 235578
*	[X86][SSE] Add SSE MOVQ instructions to SSEPackedInt domain	Simon Pilgrim	2015-02-16	1	-1/+1
\| \| \| \| \| \| \| \|	Patch to explicitly add the SSE MOVQ (rr,mr,rm) instructions to SSEPackedInt domain - prevents a number of costly domain switches. Differential Revision: http://reviews.llvm.org/D7600 llvm-svn: 229439
*	[x86] Teach the 128-bit vector shuffle lowering routines to take	Chandler Carruth	2015-02-16	1	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	advantage of the existence of a reasonable blend instruction. The 256-bit vector shuffle lowering has leveraged the general technique of decomposed shuffles and blends for quite some time, but this never made it back into the 128-bit code, and there are a large number of patterns where this is substantially better. For example, this removes almost all domain crossing in vector shuffles that involve some blend and some permutation with SSE4.1 and later. See the massive reduction in 'shufps' for integer test cases in this commit. This isn't perfect yet for a few reasons: 1) The v8i16 shuffle lowering continues to plague me. We don't always form an unpack-based blend when that would be better. But the wins pretty drastically outstrip the losses here. 2) The v16i8 shuffle lowering is just a disaster here. I never went and implemented blend support here for some terrible reason. I'll do that next probably. I've not updated it for now. More variations on this technique are coming as well -- we don't shuffle-into-unpack or shuffle-into-palignr, both of which would also be profitable. Note that some test cases grow significantly in the number of instructions, but I expect to actually be faster. We use pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are very likely to pipeline well (two ports on most modern intel chips) and the blend is a very fast instruction. The domain switch penalty will essentially always be more than a blend instruction, which is the only increase in tree height. llvm-svn: 229350
*	[SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats	Chandler Carruth	2015-02-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	directly into blends of the splats. These patterns show up even very late in the vector shuffle lowering where we don't have any chance for DAG combining to kick in, and blending is a tremendously simpler operation to model. By coercing the shuffle into a blend we can much more easily match and lower shuffles of splats. Immediately with this change there are significantly more blends being matched in the x86 vector shuffle lowering. llvm-svn: 229308
*	[x86] Update some tests with the latest version of my script and llc.	Chandler Carruth	2015-02-15	1	-1/+1
\| \| \| \| \| \|	This mostly adds some shuffle decode comments and cleans up indentation. llvm-svn: 229296
*	[X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2	Simon Pilgrim	2015-02-03	1	-10/+8
\| \| \| \| \| \| \| \|	Patch to match cases where shuffle masks can be reduced to bit shifts. Similar to byte shift shuffle matching from D5699. Differential Revision: http://reviews.llvm.org/D6649 llvm-svn: 228047
*	[X86][SSE] Added general integer shuffle matching for MOVQ instruction	Simon Pilgrim	2015-02-03	1	-11/+10
\| \| \| \| \| \|	This patch adds general shuffle pattern matching for the MOVQ zero-extend instruction (copy lower 64bits, zero upper) for all 128-bit integer vectors, it is added as a fallback test in lowerVectorShuffleAsZeroOrAnyExtend. llvm-svn: 228022
*	[X86][SSE] Improved (v)insertps shuffle matching	Simon Pilgrim	2015-01-10	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \|	In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable. This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced. We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function. Differential Revision: http://reviews.llvm.org/D6879 llvm-svn: 225589
*	[X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets	Simon Pilgrim	2014-12-02	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead. The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch. Differential Revision: http://reviews.llvm.org/D6458 llvm-svn: 223165
*	[X86][SSE] Improvements to byte shift shuffle matching	Simon Pilgrim	2014-11-25	1	-3/+2
\| \| \| \| \| \| \| \|	Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it. Differential Revision: http://reviews.llvm.org/D6409 llvm-svn: 222796
*	[X86][SSE] Enable commutation for SSE immediate blend instructions	Simon Pilgrim	2014-11-04	1	-20/+11
\| \| \| \| \| \| \| \| \| \|	Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask. This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests). Differential Revision: http://reviews.llvm.org/D6015 llvm-svn: 221313
*	[x86] Enable the new vector shuffle lowering by default.	Chandler Carruth	2014-10-04	1	-31/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046
*	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't	Chandler Carruth	2014-10-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022
*	[x86] Regenerate a number of FileCheck assertions with my script for	Chandler Carruth	2014-10-03	1	-93/+112
\| \| \| \| \| \| \| \| \| \|	test cases that will change with the new vector shuffle lowering. This gives us a nice baseline for deltas against. I've checked and removed the cases where there were weird register usage being pinned down, and all of these are extremely pin-pointed tests so fully checking them seems very appropriate. llvm-svn: 218941
*	[DAGCombiner] Avoid calling method 'isShuffleMaskLegal' on illegal vector types.	Andrea Di Biagio	2014-07-15	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a crasher in method 'DAGCombiner::visitOR' due to an invalid call to method 'isShuffleMaskLegal'. On x86, method 'isShuffleMaskLegal' always expects a legal vector value type in input. With this patch, we immediately check if the input OR dag node has a legal vector type; we only try to fold a OR dag node into a single shufflevector if we know that the resulting shuffle will have a legal type. This is to avoid calling method 'isShuffleMaskLegal' on a potentially illegal vector value type. Added a new test-case to file 'CodeGen/X86/combine-or.ll' to verify that DAGCombiner doesn't crash in the attempt to check/combine an OR between shuffles with illegal types. llvm-svn: 213020
*	[X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP ↵	Andrea Di Biagio	2014-06-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(or VPERM2X128). This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to the check for 'isBlendMask'; the idea is that, when possible, we should firstly check if a shuffle performs a blend, and in case, try to lower it into a BLENDI instead of selecting a SHUFP or (worse) a VPERM2X128. In general: - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128; - BLENDPS/D instructions tend to always have better 'reciprocal throughput' than the equivalent SHUFPS/D; - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more than one execution port. This patch: - Moves the check for 'isBlendMask' immediately before the check for 'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE'; - Updates existing tests for sse/avx shuffle/blend instructions to verify that we select (v)blendps/d when possible (instead of (v)shufps/d or vperm2f128). llvm-svn: 211720
*	Separate the check for blend shuffle_vector masks	Filipe Cabecinhas	2014-05-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Separate the check for blend shuffle_vector masks into isBlendMask. This function will also be used to check if a vector shuffle is legal. No change in functionality was intended, but we ended up improving codegen on two tests, which were being (more) optimized only if the resulting shuffle was legal. Reviewers: nadav, delena, andreadb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3964 llvm-svn: 209923
*	[DAGCombiner] teach how to simplify xor/and/or nodes according to the ↵	Andrea Di Biagio	2014-03-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	following rules: 1) (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask) 2) (OR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR (A, B), C, Mask) 3) (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask) 4) (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask) 5) (OR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR (A, B), Mask) 6) (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask) llvm-svn: 204160
*	[X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.	Andrea Di Biagio	2014-03-06	1	-0/+267
	This patch teaches the DAGCombiner how to fold a binary OR between two shufflevector into a single shuffle vector when possible. The rules are: 1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1) 2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2) The DAGCombiner can take advantage of the fact that OR is commutative and compute two possible shuffle masks (Mask1 and Mask2) for the resulting shuffle node. Before folding a dag according to either rule 1 or 2, DAGCombiner verifies that the resulting shuffle mask is legal for the target. DAGCombiner would firstly try to fold according to 1.; If not possible then it will try to fold according to 2. If both Mask1 and Mask2 are illegal then we conservatively don't fold the OR instruction. llvm-svn: 203156