bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Add -x86-experimental-vector-widening-legalization check to ↵	Craig Topper	2018-11-18	1	-2/+5
\| \| \| \| \| \| \| \|	combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI. I don't yet have any test cases for this, but its the right thing to do based on log file inspection. llvm-svn: 347151
*	[X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use ↵	Craig Topper	2018-11-18	1	-4/+4
\| \| \| \| \| \|	widen to refer to adding elements not making elements larger. NFC llvm-svn: 347150
*	[X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends ↵	Craig Topper	2018-11-18	1	-0/+10
\| \| \| \| \| \| \| \|	from i8 or smaller without SSE4.1. Prefer to shrink the mul instead. The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack. llvm-svn: 347149
*	[X86] Add support for matching PACKUSWB from a v64i8 shuffle.	Craig Topper	2018-11-17	1	-0/+5
\| \| \| \|	llvm-svn: 347143
*	[X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and ↵	Craig Topper	2018-11-17	1	-1/+1
\| \| \| \| \| \|	prefer-vector-width=256. llvm-svn: 347131
*	[X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask.	Craig Topper	2018-11-17	1	-8/+5
\| \| \| \|	llvm-svn: 347127
*	Use llvm::copy. NFC	Fangrui Song	2018-11-17	1	-2/+2
\| \| \| \|	llvm-svn: 347126
*	[X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under ↵	Craig Topper	2018-11-16	1	-3/+48
\| \| \| \| \| \| \| \| \| \|	-x86-experimental-vector-widening-legalization. This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur. There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement. llvm-svn: 347105
*	[X86] Qualify part of the masked gather handling in ReplaceNodeResults with ↵	Craig Topper	2018-11-16	1	-21/+23
\| \| \| \| \| \| \| \|	a getTypeAction call to know if we can use default legalization. If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed. llvm-svn: 347100
*	[X86] Remove a branch on SSE4.1 from LowerLoad	Craig Topper	2018-11-16	1	-14/+2
\| \| \| \| \| \|	We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG. llvm-svn: 347095
*	[X86] In LowerLoad, fix assert messages and rename a variable that use Zize ↵	Craig Topper	2018-11-16	1	-8/+8
\| \| \| \| \| \|	instead of Size. NFC llvm-svn: 347093
*	[X86] Disable Condbr_merge pass	Rong Xu	2018-11-16	1	-1/+1
\| \| \| \| \| \| \|	Disable Condbr_merge pass for now due to PR39658. Will reenable the pass once the bug is fixed. llvm-svn: 347079
*	[SelectionDAG] Move (repeated) SDTIntShiftDOp double shift node def to ↵	Simon Pilgrim	2018-11-16	1	-4/+0
\| \| \| \| \| \| \| \|	common code. NFCI. Prep work for PR39467. llvm-svn: 347067
*	[X86][SSE] Move number of input limit out of resolveTargetShuffleInputs.	Simon Pilgrim	2018-11-16	1	-3/+5
\| \| \| \| \| \|	Only combineX86ShufflesRecursively needs this limit. llvm-svn: 347054
*	[X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X`	Roman Lebedev	2018-11-16	1	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!), we can fold the `Z` into 'control`, and let the `BEXTR` do this too. We could just insert those 8 bits of shift amount into control, but it is better to instead zero-extend them, and 'or' them in place. We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`, and not any of the sign-extended bits. The obvious question is, is this actually legal to do? I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`: * `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.` * `A START value exceeding the operand size will not extract any bits from the second source operand.` * `Only bit positions up to (OperandSize -1) of the first source operand are extracted.` * `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.` * `The destination register is cleared if no bits are extracted.` FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases. Reviewers: RKSimon, craig.topper, spatel, andreadb Reviewed By: RKSimon, craig.topper, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54095 llvm-svn: 347048
*	[X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under ↵	Craig Topper	2018-11-16	1	-2/+20
\| \| \| \| \| \| \| \|	-x86-experimental-vector-widening. By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get. llvm-svn: 347032
*	[X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for ↵	Craig Topper	2018-11-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	legalizing vXi8 multiply. We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend. This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate. llvm-svn: 347011
*	[X86] Update a couple comments to remove a mention of a sign extending that ↵	Craig Topper	2018-11-16	1	-3/+3
\| \| \| \| \| \|	no longer happens. NFC llvm-svn: 347010
*	[X86] Remove ANY_EXTEND special case from canReduceVMulWidth	Craig Topper	2018-11-15	1	-18/+2
\| \| \| \| \| \| \| \| \| \|	Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused. I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation? Differential Revision: https://reviews.llvm.org/D54596 llvm-svn: 346995
*	[X86] Minor cleanup to getExtendInVec. NFCI	Craig Topper	2018-11-15	1	-4/+7
\| \| \| \| \| \| \| \| \| \|	Use unsigned to calculate the subvector index to avoid a cast. Remove an unnecessary condition and replace it with a stronger assert. Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue. llvm-svn: 346983
*	[X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and ↵	Craig Topper	2018-11-15	1	-14/+22
\| \| \| \| \| \| \| \| \| \| \| \|	combineMulToPMADDWD In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us. In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs. Differential Revision: https://reviews.llvm.org/D54512 llvm-svn: 346980
*	[X86] Fix MCNullStreamer support for modules with a CodeView flag	Simon Pilgrim	2018-11-15	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag. The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks. Committed on behalf of @eush (Eugene Sharygin) Differential Revision: https://reviews.llvm.org/D54008 llvm-svn: 346962
*	[X86] Add some custom type legalization rules for truncate with ↵	Craig Topper	2018-11-15	1	-0/+64
\| \| \| \| \| \| \| \|	-x86-experimental-vector-widening-legalization. This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb. llvm-svn: 346936
*	[X86] Don't mark SEXTLOADS with narrow types as Custom with ↵	Craig Topper	2018-11-15	1	-8/+27
\| \| \| \| \| \| \| \|	-x86-experimental-vector-widening-legalization. The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening. llvm-svn: 346916
*	[X86] Remove unused variable	Benjamin Kramer	2018-11-14	1	-1/+0
\| \| \| \|	llvm-svn: 346909
*	[X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under ↵	Craig Topper	2018-11-14	1	-15/+38
\| \| \| \| \| \| \| \| \| \|	-x86-experimental-vector-widening-legalization. On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead. There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue. llvm-svn: 346908
*	Bias physical register immediate assignments	Nirav Dave	2018-11-14	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The machine scheduler currently biases register copies to/from physical registers to be closer to their point of use / def to minimize their live ranges. This change extends this to also physical register assignments from immediate values. This causes a reduction in reduction in overall register pressure and minor reduction in spills and indirectly fixes an out-of-registers assertion (PR39391). Most test changes are from minor instruction reorderings and register name selection changes and direct consequences of that. Reviewers: MatzeB, qcolombet, myatsina, pcc Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya, javed.absar, arphaman, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54218 llvm-svn: 346894
*	[X86] Allow pmulh to be formed from narrow vXi16 vectors under ↵	Craig Topper	2018-11-14	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	-x86-experimental-vector-widening-legalization Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs. Differential Revision: https://reviews.llvm.org/D54513 llvm-svn: 346879
*	[CostModel] Add generic expansion funnel shift cost support	Simon Pilgrim	2018-11-14	1	-13/+11
\| \| \| \| \| \| \| \|	Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854
*	[X86][AVX512] Remove constant pool shuffle decoding from SelectionDAG	Simon Pilgrim	2018-11-14	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments. The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate.... Differential Revision: https://reviews.llvm.org/D54083 llvm-svn: 346850
*	[SelectionDAG][X86] Relax restriction on the width of an input to ↵	Craig Topper	2018-11-13	5	-210/+245
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	_EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784
*	[CostModel][X86] Fix constant vector XOP rights shifts	Simon Pilgrim	2018-11-13	1	-2/+11
\| \| \| \| \| \| \| \|	We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760
*	Fix comment for XOP rotates. NFCI.	Simon Pilgrim	2018-11-13	1	-1/+1
\| \| \| \|	llvm-svn: 346753
*	[X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387)	Simon Pilgrim	2018-11-12	1	-8/+115
\| \| \| \| \| \| \| \|	This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location. Differential Revision: https://reviews.llvm.org/D54267 llvm-svn: 346706
*	[X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of ↵	Craig Topper	2018-11-12	1	-13/+18
\| \| \| \| \| \| \| \| \| \|	directly emitting PACKUS. Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis. This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not. llvm-svn: 346697
*	[CostModel][X86] Add funnel shift rotation special case costs	Simon Pilgrim	2018-11-12	1	-1/+82
\| \| \| \| \| \|	When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688
*	[CostModel][X86] Add SHLD/SHRD scalar funnel shift costs	Simon Pilgrim	2018-11-12	1	-2/+11
\| \| \| \| \| \|	The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683
*	[CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is ↵	Simon Pilgrim	2018-11-12	1	-5/+13
\| \| \| \| \| \|	aligned within the source vector llvm-svn: 346664
*	[X86] Use DAG.getConstant instead of getZeroVector.	Craig Topper	2018-11-11	1	-1/+1
\| \| \| \|	llvm-svn: 346605
*	[X86] Replace calls to getOnesVector/getZeroVector with getConstant.	Craig Topper	2018-11-11	1	-2/+2
\| \| \| \| \| \|	getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization. llvm-svn: 346603
*	[X86] Remove unused variable	Benjamin Kramer	2018-11-10	1	-1/+0
\| \| \| \|	llvm-svn: 346592
*	[X86] Remove apparently unneeded code from combineVSZext.	Craig Topper	2018-11-10	1	-50/+0
\| \| \| \| \| \| \| \|	No lit tests fail with this code removed. This is a pre-commit for D54346. llvm-svn: 346590
*	[CostModel][X86] SK_ExtractSubvector costs must only be tested for vector ↵	Simon Pilgrim	2018-11-10	1	-1/+1
\| \| \| \| \| \|	types (PR39615) llvm-svn: 346589
*	[X86][BdVer2] Fix loads/stores throughput for Piledriver (PR39465)	Roman Lebedev	2018-11-10	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	There are two AGU units, and per 1cy, there can be either two loads, or a load and a store; but not two stores, or two loads and a store. Additionally, loads shouldn't affect the store scheduler and vice versa. (but should affect the PdEX scheduler.) Required rL346545. Fixes https://bugs.llvm.org/show_bug.cgi?id=39465 llvm-svn: 346587
*	[X86] Use a MOVSX instruction instead of a MOVZX instruction in isel for an ↵	Craig Topper	2018-11-10	1	-0/+9
\| \| \| \| \| \| \| \|	any_extend of the remainder from an 8-bit sdivrem. The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them. llvm-svn: 346581
*	[X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of ↵	Craig Topper	2018-11-10	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	directly using X86ISD::UNPCKL/X86ISD::UNPCKH. This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems. While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own. llvm-svn: 346574
*	[X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from ↵	Craig Topper	2018-11-09	2	-8/+18
\| \| \| \| \| \| \| \| \| \|	lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs. llvm-svn: 346552
*	[X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded.	Craig Topper	2018-11-09	1	-0/+12
\| \| \| \| \| \| \| \|	This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch. llvm-svn: 346539
*	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the ↵	Simon Pilgrim	2018-11-09	1	-181/+187
\| \| \| \| \| \|	start of the source vector llvm-svn: 346538
*	[x86] try to form broadcast before widening shuffle elements	Sanjay Patel	2018-11-09	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed that we weren't generating broadcasts as much I thought we would with D54271, and this is part of the problem. Widening the shuffle elements means adding bitcasts and hiding the relationship between a splatted scalar and the vector. If we can form a broadcast, do that before going through the rest of the shuffle lowering because broadcasts should be cheap and can often be load-folded. Differential Revision: https://reviews.llvm.org/D54280 llvm-svn: 346498