bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)	Simon Pilgrim	2017-12-29	20	-504/+756
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles. This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles. By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks. We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc. Differential Revision: https://reviews.llvm.org/D38318 llvm-svn: 321553
*	[PowerPC] Fix for PR35688 - handle out-of-range values for r+r to r+i conversion	Nemanja Ivanovic	2017-12-29	3	-0/+1483
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revision 320791 introduced a pass that transforms reg+reg instructions to reg+imm if they're fed by "load immediate". However, it didn't handle out-of-range shifts correctly as reported in PR35688. This patch fixes that and therefore the PR. Furthermore, there was undefined behaviour in the patch where the RHS of an initialization expression was 32 bits and constant `1` was shifted left 32 bits. This was fixed by ensuring the RHS is 64 bits just like the LHS. Differential Revision: https://reviews.llvm.org/D41369 llvm-svn: 321551
*	[x86] add tests for potential memcmp expansion (PR33325); NFC	Sanjay Patel	2017-12-28	1	-0/+252
\| \| \| \|	llvm-svn: 321542
*	Unbreak test relying on debug output after r321540.	Benjamin Kramer	2017-12-28	1	-0/+2
\| \| \| \|	llvm-svn: 321541
*	[X86] When lowering extending loads from v2i1/v4i1, if we have VLX, use a ↵	Craig Topper	2017-12-28	3	-24/+13
\| \| \| \| \| \| \| \| \| \| \| \|	narrower extend. Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend. This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG. If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary. llvm-svn: 321538
*	[WinEH] Don't emit state stores or EH thunks for available_externally functions	Reid Kleckner	2017-12-28	1	-0/+28
\| \| \| \| \| \| \| \| \| \|	The exception handler thunk needs to reference the LSDA of the parent function, which won't be emitted if it's available_externally. Fixes PR35736. ThinLTO ends up producing available_externally functions that use _CxxFrameHandler3. llvm-svn: 321532
*	Fix tests after move to utohexstr.	Benjamin Kramer	2017-12-28	1	-1/+1
\| \| \| \|	llvm-svn: 321527
*	[X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros	Simon Pilgrim	2017-12-28	2	-40/+22
\| \| \| \| \| \| \| \| \| \|	If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow. The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend. Differential Revision: https://reviews.llvm.org/D41484 llvm-svn: 321516
*	[X86] Reimplement r321437 using custom lowering instead of as a DAG combine.	Craig Topper	2017-12-27	1	-2/+2
\| \| \| \| \| \| \| \|	My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues. So just do it in LowerMUL where we can catch more cases. llvm-svn: 321496
*	[AArch64] Change order of candidate FMLS patterns	Matthew Simpson	2017-12-27	1	-4/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r319980 added new patterns to the machine combiner for transforming (fsub (fmul x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source operand is an fmul are transformed. We previously only matched the case where the second source operand of an fsub was an fmul, transforming (fsub z (fmul x y)) into (fmls z x y). Now, if we have an fsub where both source operands are fmuls, both of the above patterns are applicable. However, the order in which we add the patterns to the list of candidates determines the transformation that takes place, since only the first pattern that matches will be used. This patch changes the order these two patterns are added to the list of candidates such that we prefer the case where the second source operand is an fmul (the fmls case), rather than the other one (the fmla/fneg case). When both source operands are fmuls, this ordering results in fewer instructions. Differential Revision: https://reviews.llvm.org/D41587 llvm-svn: 321491
*	[X86] Fix vmul combine for AVX1 targets.	Benjamin Kramer	2017-12-27	1	-0/+44
\| \| \| \| \| \|	v8i32 is legal von AVX1, but it doesn't have pmuludq for it. llvm-svn: 321490
*	[DAGCombine] foldBinOpIntoSelect can fail to constant fold in some cases.	Simon Pilgrim	2017-12-27	1	-0/+35
\| \| \| \| \| \| \| \|	For example, float operations may fail to constant fold under certain circumstances (inf/nan/denormal creation etc.) Reduced from oss-fuzz #4802 test case llvm-svn: 321488
*	[DAGCombine] visitANDLike - ensure APInt is is in range for ↵	Simon Pilgrim	2017-12-26	1	-0/+13
\| \| \| \| \| \| \| \|	getSExtValue/getZExtValue Reduced from oss-fuzz #4782 test case llvm-svn: 321464
*	[X86] Regenerate test using update_llc_test_checks.py.	Craig Topper	2017-12-26	1	-34/+128
\| \| \| \|	llvm-svn: 321462
*	[DAGCombine] Don't combine (and (setne X, 0), (setne X, -1)) --> (setuge ↵	Simon Pilgrim	2017-12-26	1	-0/+24
\| \| \| \| \| \| \| \|	(add X, 1), 2) for i1 Reduced from oss-fuzz #4773 test case llvm-svn: 321455
*	[X86] Pass itins.rr/itins.rm through properly for some instructions.	Craig Topper	2017-12-26	1	-26/+6
\| \| \| \|	llvm-svn: 321452
*	[X86] Use SSE_INTMUL_ITINS_P for the AVX-512 MUL instructions to match their ↵	Craig Topper	2017-12-26	1	-4/+4
\| \| \| \| \| \|	SSE/AVX counterparts. llvm-svn: 321451
*	[X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the ↵	Craig Topper	2017-12-25	2	-56/+20
\| \| \| \| \| \| \| \| \| \|	upper bits are all sign bits or zeros. Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ. This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop. llvm-svn: 321437
*	[X86] Add avx512vl and avx512dq command lines to combine-pmuldq.ll to ↵	Craig Topper	2017-12-25	1	-31/+101
\| \| \| \| \| \| \| \|	demonstrate where we fail to use pmuldq/pmuludq and use to pmullq instead. It's nice that pmullq exists, but it has higher latency and probably lower throughput than pmuldq/pmuludq. We should prefer those if we can. llvm-svn: 321436
*	[X86][AVX] Add AVX1/AVX2 vmul tests	Simon Pilgrim	2017-12-24	1	-939/+2056
\| \| \| \|	llvm-svn: 321426
*	[X86][X87] Mark pseudo memory fold instructions as load/sideeffects ↵	Simon Pilgrim	2017-12-24	1	-1/+1
\| \| \| \| \| \| \| \|	(PR21160, PR34080, PR34454). Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass. llvm-svn: 321424
*	[X86][X87] Renamed CHECK prefix, its not actually broken anymore just ↵	Simon Pilgrim	2017-12-24	1	-39/+39
\| \| \| \| \| \|	scheduled differently llvm-svn: 321423
*	[X86][X87] Add another test case mentioned on PR34080	Simon Pilgrim	2017-12-24	1	-0/+136
\| \| \| \| \| \|	Did my best to reduce this, but the X87 scheduling bug is hard to hit at the best of times... llvm-svn: 321422
*	[X86] Fix (v2f64 (s/uint_to_fp (v2i1))) to avoid scalarization without AVX512DQ.	Craig Topper	2017-12-24	2	-38/+26
\| \| \| \| \| \|	Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32. llvm-svn: 321420
*	[DAGCombiners] Don't turn ANDs to shuffles with zero so early. Give some ↵	Craig Topper	2017-12-24	1	-2/+0
\| \| \| \| \| \| \| \|	other combines a chance to run. This moves the combine for turning ANDs into shuffle with zero out of SimplifyVBinOps and places it only in visitAND below the reassociate handling. This fixes the specific case I noticed where we failed to combine two ands with constants. llvm-svn: 321417
*	[X86] Teach WidenMaskArithmetic to handle any constant buildvector on the ↵	Craig Topper	2017-12-24	2	-21/+112
\| \| \| \| \| \|	RHS not just all zeros/ones. llvm-svn: 321415
*	[SelectionDAG] Teach SelectionDAG::getNode to constant fold zext/aext/sext ↵	Craig Topper	2017-12-23	4	-30/+22
\| \| \| \| \| \|	of constant build vectors. llvm-svn: 321414
*	[X86] Remove type restrictions from WidenMaskArithmetic.	Craig Topper	2017-12-23	3	-87/+34
\| \| \| \| \| \|	This can help AVX-512 code where mask types are legal allowing us to remove extends and truncates to/from mask types. llvm-svn: 321408
*	[SelectionDAG] Reverse the order of operands in the ISD::ADD created by ↵	Craig Topper	2017-12-22	8	-626/+525
\| \| \| \| \| \| \| \| \| \|	TargetLowering::getVectorElementPointer so that the FrameIndex is on the left. This seems to improve X86's ability to match this into an address computation. Otherwise the other operand gets assigned to the base register and the stack pointer + frame index ends up in the index register. But index registers can't encode ESP/RSP so we end up having to move it into another register to meet the constraint. I could try to improve the address matcher in X86, but swapping the producer seemed easier. Several other places already have the operands in this order so this is at least consistent. llvm-svn: 321370
*	[X86] When lowering insert_vector_elt/extract_vector_elt of vXi1 with a ↵	Craig Topper	2017-12-22	1	-62/+30
\| \| \| \| \| \| \| \|	non-constant index just use either a 128-bit type or the vXi8 type with the correct number of elements. Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code. llvm-svn: 321369
*	[mips] Add test case to check that calls to mcount follow long calls / short ↵	Simon Atanasyan	2017-12-22	1	-0/+19
\| \| \| \| \| \|	calls options. NFC llvm-svn: 321357
*	[ARM GlobalISel] Support G_INTTOPTR and G_PTRTOINT for s32	Diana Picus	2017-12-22	3	-0/+139
\| \| \| \| \| \| \|	Mark conversions between pointers and 32-bit scalars as legal, map them to the GPR and select to a simple COPY. llvm-svn: 321356
*	[ARM GlobalISel] Support pointer constants	Diana Picus	2017-12-22	2	-0/+23
\| \| \| \| \| \| \| \| \| \|	Pointer constants are pretty rare, since we usually represent them as integer constants and then cast to pointer. One notable exception is the null pointer constant, which is represented directly as a G_CONSTANT 0 with pointer type. Mark it as legal and make sure it is selected like any other integer constant. llvm-svn: 321354
*	[DAGCombine] Revert r321259	Sam Parker	2017-12-22	2	-114/+22
\| \| \| \| \| \| \|	Improve ReduceLoadWidth for SRL Patch is causing an issue on the PPC64 BE santizer. llvm-svn: 321349
*	[X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling ↵	Craig Topper	2017-12-22	1	-14/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	for prefetch instructions. Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well. The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled. Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions. I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations. llvm-svn: 321335
*	[X86] Use SIGN_EXTEND to implement ANY_EXTEND from vXi1.	Craig Topper	2017-12-22	5	-155/+141
\| \| \| \|	llvm-svn: 321334
*	[X86] Use SIGN_EXTEND rather than ZERO_EXTEND for lowering ↵	Craig Topper	2017-12-21	1	-4/+4
\| \| \| \| \| \| \| \|	extract_vector_elt from vXi1 with a non-const index. We have a better range of instructions we can use if we can fill with the value i1 value rather than zeroing. llvm-svn: 321315
*	[X86] When lowering truncates to vXi1, don't sign extend i16/i8 types to ↵	Craig Topper	2017-12-21	3	-21/+21
\| \| \| \| \| \| \| \|	512-bit if we have VLX. This should only affect what we do for v8i16. Previously we went to v8i64, but if we have VLX we only need v8i32. This prevents an unnecessary zmm usage. llvm-svn: 321303
*	[X86] Promote v8i1 shuffles to v8i32 instead of v8i64 if we have VLX.	Craig Topper	2017-12-21	3	-144/+160
\| \| \| \| \| \| \| \|	We should have equally good shuffle options for v8i32 with VLX. This was spotted during my attempts to remove 512-bit vectors from SKX. We still use 512-bits for v16i1, v32i1, and v64i1. I'm less sure we can handle those well with narrower vectors. i32 and i64 element sizes get the best shuffle support. llvm-svn: 321291
*	[X86][SSE] Split large PAVGB/PAVGW vectors to legal widths	Simon Pilgrim	2017-12-21	2	-2308/+287
\| \| \| \| \| \| \| \|	Patch to allow detectAVGPattern handle vectors larger than the legal size (128 SSE2, 256 AVX2, 512 AVX512BW), splitting the vectors accordingly. Differential Revision: https://reviews.llvm.org/D41440 llvm-svn: 321288
*	[DAGCombiner] Generalize (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) ↵	Simon Pilgrim	2017-12-21	2	-8/+10
\| \| \| \| \| \| \| \|	combine to work on non-splat vectors The knownbits_mask_or_shuffle_uitofp change is interesting - shuffle combines manage to kick in, removing the AND constant mask load. For targets with fast-variable-shuffle this should reduce further to VPOR+VPSHUFB+VCVTDQ2PS. llvm-svn: 321279
*	[X86] Add (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) non-splat vector test	Simon Pilgrim	2017-12-21	1	-4/+14
\| \| \| \|	llvm-svn: 321278
*	[PowerPC] Fix parest build failure in SPEC2017.	Tony Jiang	2017-12-21	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The build failure was caused by an assertion in pre-legalization DAGCombine: Combining: t6: ppcf128 = uint_to_fp t5 ... into: t20: f32 = PPCISD::FCFIDUS t19 which is clearly wrong since ppcf128 are definitely different type with f32 and we cannot change the node value type when do DAGCombine. The fix is don't handle ppc_fp128 or i1 conversions in PPCTargetLowering::combineFPToIntToFP and leave it to downstream to legalize it and expand it to small legal types. Differential Revision: https://reviews.llvm.org/D41411 llvm-svn: 321276
*	[DAGCombiner] Generalize (and (or x, C), D) -> D iff (C & D) == D combine to ↵	Simon Pilgrim	2017-12-21	1	-2/+1
\| \| \| \| \| \|	work on non-splat vectors llvm-svn: 321275
*	[X86] Add (and (or x, C), D) -> D iff (C & D) == D non-splat vector test	Simon Pilgrim	2017-12-21	1	-0/+11
\| \| \| \|	llvm-svn: 321268
*	[X86] Add v48i8 AVG test case, based on discussion on D41440	Simon Pilgrim	2017-12-21	1	-0/+404
\| \| \| \|	llvm-svn: 321261
*	[DAGCombine] Improve ReduceLoadWidth for SRL	Sam Parker	2017-12-21	2	-22/+114
\| \| \| \| \| \| \| \| \| \| \|	If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 321259
*	[X86] Use PSHUFB for v32i16 shuffles before falling back to VPERMW/VPERMI2W.	Craig Topper	2017-12-21	1	-0/+15
\| \| \| \| \| \|	PSHUFB has the ability to implicitly 0 elements which VPERMI2W can't do. So give a chance to use it first. llvm-svn: 321251
*	[X86] Use VPERMI2B for v16i8 shuffles if we have VBMI+VLX and would have ↵	Craig Topper	2017-12-21	1	-7/+21
\| \| \| \| \| \|	otherwise used two PSHUFBs ORed together. llvm-svn: 321249
*	[X86] Use VPERMB/VPERMI2B for v32i8 shuffle lowering if VBMI and VLX are ↵	Craig Topper	2017-12-21	1	-143/+248
\| \| \| \| \| \|	supported. llvm-svn: 321248