bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] More aggressive shuffle mask widening in combineExtractWithShuffle	Simon Pilgrim	2019-01-12	1	-2/+1
\| \| \| \| \| \|	Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through. llvm-svn: 351013
*	[DAGCombiner] fold insert_subvector of insert_subvector	Sanjay Patel	2019-01-12	2	-100/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pattern: t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0> t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0> ...shows up in PR33758: https://bugs.llvm.org/show_bug.cgi?id=33758 ...although this patch doesn't make any difference to the final result on that yet. In the affected tests here, it looks like it just makes RA wiggle. But we might as well squash this to prevent it interfering with other pattern-matching. Differential Revision: https://reviews.llvm.org/D56604 llvm-svn: 351008
*	[X86] Add more usub.sat vector tests; NFC	Nikita Popov	2019-01-12	1	-143/+1560
\| \| \| \| \| \|	Add additional vXi32 and vXi64 tests. llvm-svn: 351003
*	[X86] Improve vXi64 ISD::ABS codegen with SSE41+	Simon Pilgrim	2019-01-12	2	-135/+254
\| \| \| \| \| \| \| \|	Make use of vblendvpd to select on the signbit Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350999
*	[X86][AARCH64] Improve ISD::ABS support	Simon Pilgrim	2019-01-12	2	-22/+19
\| \| \| \| \| \| \| \|	This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998
*	[X86] Add ISD node for masked version of CVTPS2PH.	Craig Topper	2019-01-12	2	-17/+17
\| \| \| \| \| \| \| \| \| \|	The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994
*	[X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not ↵	Craig Topper	2019-01-12	6	-240/+108
\| \| \| \| \| \| \| \| \| \|	avx512dq, use v16i1 as the intermediate mask type instead of v8i1. We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type. By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned. llvm-svn: 350989
*	[X86] Add ISD nodes for masked truncate so we can properly represent when ↵	Craig Topper	2019-01-12	4	-302/+309
\| \| \| \| \| \| \| \| \| \| \| \|	the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985
*	[Legalizer] Use correct ValueType of SELECT_CC node during Float promotion	Pirama Arumuga Nainar	2019-01-11	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When legalizing the result of a SELECT_CC node by promoting the floating-point type, use the promoted-to type rather than the original type. Fix PR40273. Reviewers: efriedma, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56566 llvm-svn: 350951
*	[x86] allow insert/extract when matching horizontal ops	Sanjay Patel	2019-01-11	2	-107/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we limited this transform to cases where the extraction into the build vector happens from vectors of the same type as the build vector, but that's not required. There's a slight potential regression seen in the AVX512 result for phadd -- we're using the 256-bit flavor of the instruction now even though the 128-bit subset is sufficient. The same problem could already be seen in the AVX2 result. Follow-up patches will attempt to narrow that back down. llvm-svn: 350928
*	[X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. ↵	Craig Topper	2019-01-11	6	-123/+99
\| \| \| \| \| \| \| \| \| \| \| \|	Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector. We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions. Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector. This fixes some of the regressions from D350800. llvm-svn: 350918
*	[X86] Call SimplifyDemandedBits on conditions of X86ISD::SHRUNKBLEND	Craig Topper	2019-01-10	5	-109/+95
\| \| \| \| \| \| \| \| \| \|	This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created. This should help some of the regressions from D56387 Differential Revision: https://reviews.llvm.org/D56421 llvm-svn: 350875
*	[x86] fix remaining miscompile bug in horizontal binop matching (PR40243)	Sanjay Patel	2019-01-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	When we use the partial-matching function on a 128-bit chunk, we must account for the possibility that we've matched undef halves of the original source vectors, so the outputs may need to be reset. This should allow closing PR40243: https://bugs.llvm.org/show_bug.cgi?id=40243 llvm-svn: 350830
*	[x86] fix horizontal binop matching for 256-bit vectors (PR40243)	Sanjay Patel	2019-01-10	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a partial fix for: https://bugs.llvm.org/show_bug.cgi?id=40243 ...as seen in the integer test, we still need to correct the result when using the existing (old) horizontal op matching function because it does not model the way x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op). A potential follow-up change for that is discussed in the bug report - see also D56490. This generally duplicates a lot of the existing matching code, but we can't just remove that without introducing regressions, so the existing code is renamed and used less often. Follow-ups may try to reduce that overlap. Differential Revision: https://reviews.llvm.org/D56450 llvm-svn: 350826
*	[X86] Add SSE41 vector abs tests	Simon Pilgrim	2019-01-10	1	-6/+86
\| \| \| \|	llvm-svn: 350822
*	[X86] Disable DomainReassignment pass when AVX512BW is disabled to avoid ↵	Craig Topper	2019-01-10	4	-166/+314
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	injecting VK32/VK64 references into the MachineIR Summary: This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that. Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary. The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0. Fixes PR39741 Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56460 llvm-svn: 350800
*	[x86] use 'nounwind' to remove test noise; NFC	Sanjay Patel	2019-01-09	1	-38/+2
\| \| \| \|	llvm-svn: 350745
*	[X86][SSE] Cleanup shuffle combining test check prefixes	Simon Pilgrim	2019-01-09	9	-1343/+819
\| \| \| \| \| \|	Share prefixes whenever possible, use X86 instead of X32. llvm-svn: 350722
*	[X86] Enable combining shuffles to PACKSS/PACKUS for 256/512-bit vectors	Simon Pilgrim	2019-01-09	2	-18/+6
\| \| \| \|	llvm-svn: 350716
*	[X86] Add extra test coverage for combining shuffles to PACKSS/PACKUS	Simon Pilgrim	2019-01-09	2	-0/+94
\| \| \| \|	llvm-svn: 350707
*	[x86] add tests for PR40243; NFC	Sanjay Patel	2019-01-08	2	-2/+51
\| \| \| \|	llvm-svn: 350646
*	[RegisterCoalescer] dst register's live interval needs to be updated when	Wei Mi	2019-01-08	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	merging a src register in ToBeUpdated set. This is to fix PR40061 related with https://reviews.llvm.org/rL339035. In https://reviews.llvm.org/rL339035, live interval of source pseudo register in rematerialized copy may be saved in ToBeUpdated set and its update may be postponed. In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set to postpone its live interval update. After the rematerialization, the live interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1 gets removed. After the merge, %t3 contains live interval larger than necessary. Because %t3 is not in toBeUpdated set, its live interval is not updated after register coalescing and it will break some assumption in regalloc. The patch requires the live interval of destination register in a merge to be updated if the source register is in ToBeUpdated. Differential revision: https://reviews.llvm.org/D55867 llvm-svn: 350586
*	Recommit r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. ↵	Craig Topper	2019-01-07	6	-549/+1791
\| \| \| \| \| \| \| \|	Replace with target independent funnel shift intrinsics." The MSVC limit we hit on AutoUpgrade.cpp has been worked around for now. llvm-svn: 350567
*	Revert r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. ↵	Craig Topper	2019-01-07	6	-1791/+549
\| \| \| \| \| \| \| \|	Replace with target independent funnel shift intrinsics." The AutoUpgrade.cpp if/else cascade hit an MSVC limit again. llvm-svn: 350562
*	[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target ↵	Craig Topper	2019-01-07	6	-549/+1791
\| \| \| \| \| \| \| \|	independent funnel shift intrinsics. Differential Revision: https://reviews.llvm.org/D56377 llvm-svn: 350554
*	[X86] Add OR(AND(X,C),AND(Y,~C)) bit select tests	Simon Pilgrim	2019-01-07	1	-0/+592
\| \| \| \| \| \|	Based off work for D55935 llvm-svn: 350548
*	[x86] add more tests for LowerToHorizontalOp(); NFC	Sanjay Patel	2019-01-07	1	-0/+236
\| \| \| \| \| \| \|	These tests show missed optimizations and a miscompile similar to PR40243 - https://bugs.llvm.org/show_bug.cgi?id=40243 llvm-svn: 350533
*	[X86] Update VBMI2 vshld/vshrd tests to use an immediate that doesn't ↵	Craig Topper	2019-01-07	6	-162/+162
\| \| \| \| \| \| \| \|	require a modulo. Planning to replace these with funnel shift intrinsics which would mask out the extra bits. This will help minimize test diffs. llvm-svn: 350504
*	[X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions.	Craig Topper	2019-01-06	6	-276/+3224
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56361 llvm-svn: 350498
*	[X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 ↵	Craig Topper	2019-01-05	2	-168/+38
\| \| \| \| \| \|	(truncate (v64i8)))) on KNL. llvm-svn: 350481
*	[X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the ↵	Craig Topper	2019-01-05	8	-199/+52
\| \| \| \| \| \| \| \|	input is a truncate from v16i8/v32i8. This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both. llvm-svn: 350480
*	[X86] Regenerate test to merge 32-bit and 64-bit check lines. NFC	Craig Topper	2019-01-05	1	-17/+5
\| \| \| \|	llvm-svn: 350474
*	[X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate ↵	Craig Topper	2019-01-05	2	-9/+3
\| \| \| \| \| \|	when -mprefer-vector-width-256 is in effect and BWI is not available. llvm-svn: 350473
*	Update the pr_datasz of .note.gnu.property section.	Vyacheslav Zakharin	2019-01-04	1	-2/+2
\| \| \| \| \| \| \| \|	Patch by Xiang Zhang. Differential Revision: https://reviews.llvm.org/D56080 llvm-svn: 350436
*	[X86] Add INSERT_SUBVECTOR to ComputeNumSignBits	Craig Topper	2019-01-04	1	-28/+10
\| \| \| \| \| \| \| \| \| \|	This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits. My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common. Differential Revision: https://reviews.llvm.org/D56283 llvm-svn: 350432
*	[x86] add tests for potential horizontal vector ops; NFC	Sanjay Patel	2019-01-04	1	-0/+629
\| \| \| \| \| \|	These are modified versions of the FP tests from rL349923. llvm-svn: 350430
*	[x86] lower extracted fadd/fsub to horizontal vector math; 2nd try	Sanjay Patel	2019-01-04	2	-298/+584
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 1st try for this was at rL350369, but it caused IR-level diffs because our cost models differentiate custom vs. legal/promote lowering. So that was reverted at rL350373. The cost models were fixed independently at rL350403, so this is effectively the same patch as last time. Original commit message: This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350421
*	[X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine	Simon Pilgrim	2019-01-04	2	-40/+0
\| \| \| \| \| \|	Repeat of the generic SimplifyDemandedBits shift combine llvm-svn: 350399
*	[X86] Split immediate shifts tests. NFCI.	Simon Pilgrim	2019-01-04	1	-26/+67
\| \| \| \| \| \|	A future patch will combine logical shifts more aggressively. llvm-svn: 350396
*	[X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the ↵	Craig Topper	2019-01-04	1	-46/+101
\| \| \| \| \| \| \| \| \| \|	zero flag is used. Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register. Differential Revision: https://reviews.llvm.org/D56246 llvm-svn: 350374
*	revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math	Sanjay Patel	2019-01-04	2	-584/+298
\| \| \| \| \| \|	There are non-codegen tests that need to be updated with this code change. llvm-svn: 350373
*	[x86] lower extracted fadd/fsub to horizontal vector math	Sanjay Patel	2019-01-03	2	-298/+584
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350369
*	[x86] add 512-bit vector tests for horizontal ops; NFC	Sanjay Patel	2019-01-03	2	-2/+128
\| \| \| \|	llvm-svn: 350364
*	[x86] add AVX512 runs for horizontal ops; NFC	Sanjay Patel	2019-01-03	2	-15/+55
\| \| \| \|	llvm-svn: 350362
*	[X86] Add test case for D56283.	Craig Topper	2019-01-03	1	-0/+66
\| \| \| \| \| \|	This tests a case where we need to be able to compute sign bits for two insert_subvectors that is a liveout of a basic block. The result is then used as a boolean vector in another basic block. llvm-svn: 350359
*	[x86] remove dead CHECK lines from test file; NFC	Sanjay Patel	2019-01-03	1	-26/+0
\| \| \| \|	llvm-svn: 350358
*	[x86] split tests for FP and integer horizontal math	Sanjay Patel	2019-01-03	2	-154/+173
\| \| \| \| \| \| \| \| \| \| \| \|	These are similar patterns, but when you throw AVX512 onto the pile, the number of variations explodes. For FP, we really don't care about AVX1 vs. AVX2 for FP ops. There may be some superficial shuffle diffs, but that's not what we're testing for here, so I removed those RUNs. Separating by type also lets us specify 'sse3' for the FP file vs. 'ssse3' for the integer file...because x86. llvm-svn: 350357
*	[x86] add common FileCheck prefix to reduce assert duplication; NFC	Sanjay Patel	2019-01-03	1	-186/+90
\| \| \| \|	llvm-svn: 350356
*	[DAGCombiner][x86] scalarize binop followed by extractelement	Sanjay Patel	2019-01-03	27	-1457/+1202
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354
*	[x86] add tests for buildvector with extracted element; NFC	Sanjay Patel	2019-01-03	1	-0/+703
\| \| \| \|	llvm-svn: 350338