bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Replace custom lowering of vXi1 SINT_TO_FP/UINT_TO_FP with promotion.	Craig Topper	2018-01-01	1	-32/+20
\| \| \| \| \| \|	The custom lowering was just doing the same thing promotion would do. llvm-svn: 321630
*	[SelectionDAG][X86][AArch64] Require targets to specify the promotion type ↵	Craig Topper	2018-01-01	2	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when using setOperationAction Promote for INT_TO_FP and FP_TO_INT Currently the promotion for these ignores the normal getTypeToPromoteTo and instead just tries to double the element width. This is because the default behavior of getTypeToPromote to just adds 1 to the SimpleVT, which has the affect of increasing the element count while keeping the scalar size the same. If multiple steps are required to get to a legal operation type, int_to_fp will be promoted multiple times. And fp_to_int will keep trying wider types in a loop until it finds one that works. getTypeToPromoteTo does have the ability to query a promotion map to get the type and not do the increasing behavior. It seems better to just let the target specify the promotion type in the map explicitly instead of letting the legalizer iterate via widening. FWIW, it's worth I think for any other vector operations that need to be promoted, we have to specify the type explicitly because the default behavior of getTypeToPromote isn't useful for vectors. The other types of promotion already require either the element count is constant or the total vector width is constant, but neither happens by incrementing the SimpleVT enum. Differential Revision: https://reviews.llvm.org/D40664 llvm-svn: 321629
*	[X86] In LowerTruncateVecI1, don't add SHL if the input is known to be all ↵	Craig Topper	2018-01-01	1	-10/+16
\| \| \| \| \| \| \| \|	sign bits. If the input is all sign bits then the LSB through MSB are all the same so we don't need to be move the LSB to the MSB. llvm-svn: 321617
*	[X86] Add missing NoVLX predicate around some patterns that use zmm ↵	Craig Topper	2018-01-01	1	-1/+1
\| \| \| \| \| \|	registers to implement 128/256-bit operations without VLX. llvm-svn: 321613
*	[X86] Add patterns for using zmm registers for v8i32/v8f32 vselect with the ↵	Craig Topper	2018-01-01	1	-19/+24
\| \| \| \| \| \| \| \|	false input being zero. We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately. llvm-svn: 321612
*	[X86] Use CONCAT_VECTORS instead of INSERT_SUBVECTOR for padding v4i1/v2i1 ↵	Craig Topper	2017-12-31	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	vector to v8i1 pre-legalize. The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed. I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway. llvm-svn: 321611
*	[X86][AVX2] Combine extract(broadcast(scalar_value)) --> scalar_value	Simon Pilgrim	2017-12-31	1	-0/+5
\| \| \| \| \| \|	As it has a scalar source we don't treat it as a target shuffle so needs special handling. llvm-svn: 321610
*	[X86][SSE] Don't vectorize splat buildvector of binops (PR30780)	Simon Pilgrim	2017-12-31	1	-0/+4
\| \| \| \| \| \|	Don't combine buildvector(binop(),binop(),binop(),binop()) -> binop(buildvector(), buildvector()) if its a splat - keep the binop scalar and just splat the result to avoid large vector constants. llvm-svn: 321607
*	[X86] Add a DAG combine to widen (i4 (bitcast (v4i1))) before type ↵	Craig Topper	2017-12-31	1	-0/+12
\| \| \| \| \| \| \| \|	legalization sees the i4 and changes to load/store. Same for v2i1 and i2. llvm-svn: 321602
*	[X86] Add a DAG combine to fix (v4i1 (bitcast (i4))) before type ↵	Craig Topper	2017-12-31	1	-1/+14
\| \| \| \| \| \| \| \|	legalization sees the i4 and changes to load/store. Same for i2 and v2i1. llvm-svn: 321601
*	[X86] Prevent combining (v8i1 (bitconvert (i8 load)))->(v8i1 load) if we ↵	Craig Topper	2017-12-31	3	-0/+12
\| \| \| \| \| \| \| \|	don't have DQI. We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns. llvm-svn: 321598
*	[X86] Remove patterns for load/store of vXi with bitcasts to/from integer.	Craig Topper	2017-12-31	1	-19/+0
\| \| \| \| \| \|	This is better handled by a DAG combine if its not already being done. No lit tests fail from the removal of these patterns. llvm-svn: 321597
*	[X86] Remove AND32ri8 from pattern for v1i1 load.	Craig Topper	2017-12-31	1	-1/+1
\| \| \| \| \| \|	I don't think anything would actually expect the other bits to be zero. llvm-svn: 321596
*	[X86] Fix a crash when returning a <1 x i1> value>	Craig Topper	2017-12-31	1	-0/+4
\| \| \| \|	llvm-svn: 321595
*	[X86] Cleanup store splitting in LowerTruncatingStore	Craig Topper	2017-12-31	1	-4/+5
\| \| \| \| \| \|	Use getMemBasePlusOffset and calculate proper pointer info and alignment for the second store. llvm-svn: 321594
*	Use phi ranges to simplify code. No functionality change intended.	Benjamin Kramer	2017-12-30	1	-7/+4
\| \| \| \|	llvm-svn: 321585
*	[PowerPC] fix a bug in TCO eligibility check	Hiroshi Inoue	2017-12-30	1	-6/+29
\| \| \| \| \| \| \| \| \| \|	If the callee and caller use different calling convensions, we cannot apply TCO if the callee requires arguments on stack; e.g. C calling convention and Fast CC use the same registers for parameter passing, but the stack offset is not necessarily same. This patch also recommit r319218 "[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions." by @sfertile since the problem reported in r320106 should be fixed. Differential Revision: https://reviews.llvm.org/D40893 llvm-svn: 321579
*	[X86] Remove isel patterns for kshifts with types that don't support kshift ↵	Craig Topper	2017-12-30	1	-17/+0
\| \| \| \| \| \| \| \|	natively. We should only be creating natively supported kshifts now. llvm-svn: 321577
*	[X86] Custom legalize vXi1 extract_subvector with KSHIFTR.	Craig Topper	2017-12-30	2	-46/+43
\| \| \| \| \| \| \| \|	This allows us to remove some isel patterns. This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI. llvm-svn: 321576
*	[mips] Provide correct descriptions of asm constraints in the comments. NFC	Simon Atanasyan	2017-12-29	1	-2/+4
\| \| \| \|	llvm-svn: 321566
*	[mips] Replace assert by an error message	Simon Atanasyan	2017-12-29	1	-2/+4
\| \| \| \| \| \| \| \|	Initially, if the `c` constraint applied to the wrong data type that causes LLVM to assert. This commit replaces the assert by an error message. llvm-svn: 321565
*	AMDGPU: Use unique PSVs for buffer resources	Matt Arsenault	2017-12-29	3	-39/+87
\| \| \| \| \| \| \|	Also fixes using the wrong memory type for some intrinsics when custom lowering them. llvm-svn: 321557
*	AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores	Matt Arsenault	2017-12-29	1	-5/+5
\| \| \| \| \| \| \|	Atomics still have hasSideEffects set on them because of the mess that is the memory properties. llvm-svn: 321556
*	AMDGPU: Implement getTgtMemIntrinsic for images	Matt Arsenault	2017-12-29	3	-20/+169
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently all images are lowered to have a single image PseudoSourceValue. Image stores happen to have overly strict mayLoad/mayStore/hasSideEffects flags set on them, so this happens to work. When these are fixed to be correct, the scheduler breaks this because the identical PSVs are assumed to be the same address. These need to be unique to the image resource value. llvm-svn: 321555
*	[X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)	Simon Pilgrim	2017-12-29	1	-11/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles. This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles. By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks. We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc. Differential Revision: https://reviews.llvm.org/D38318 llvm-svn: 321553
*	[AMDGPU][MC] Incorrect parsing of flat/global atomic modifiers	Dmitry Preobrazhensky	2017-12-29	1	-1/+39
\| \| \| \| \| \| \| \| \|	See bug 35730: https://bugs.llvm.org/show_bug.cgi?id=35730 Differential Revision: https://reviews.llvm.org/D41598 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 321552
*	[PowerPC] Fix for PR35688 - handle out-of-range values for r+r to r+i conversion	Nemanja Ivanovic	2017-12-29	4	-23/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revision 320791 introduced a pass that transforms reg+reg instructions to reg+imm if they're fed by "load immediate". However, it didn't handle out-of-range shifts correctly as reported in PR35688. This patch fixes that and therefore the PR. Furthermore, there was undefined behaviour in the patch where the RHS of an initialization expression was 32 bits and constant `1` was shifted left 32 bits. This was fixed by ensuring the RHS is 64 bits just like the LHS. Differential Revision: https://reviews.llvm.org/D41369 llvm-svn: 321551
*	Fix incorrect operand sizes for some MMX instructions: punpcklwd, punpcklbw ↵	Andrew V. Tischenko	2017-12-29	1	-5/+9
\| \| \| \| \| \| \| \|	and punpckldq. Differential Revision: https://reviews.llvm.org/D41595 llvm-svn: 321549
*	[X86] When lowering extending loads from v2i1/v4i1, if we have VLX, use a ↵	Craig Topper	2017-12-28	1	-6/+17
\| \| \| \| \| \| \| \| \| \| \| \|	narrower extend. Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend. This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG. If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary. llvm-svn: 321538
*	[X86] Use ISD::CONCAT_VECTORS when splitting 256-bit loads in combineLoad.	Craig Topper	2017-12-28	1	-3/+1
\| \| \| \|	llvm-svn: 321537
*	[X86] Fix inconsistencies in different places where we split loads/stores.	Craig Topper	2017-12-28	1	-10/+11
\| \| \| \| \| \| \| \|	-Use MinAlign instead of std::min. -Use SelectionDAG::getMemBasePlusOffset. -Apply offset to the pointer info for the second load/store created. llvm-svn: 321536
*	[X86] Emit ISD::TRUNCATE instead of X86ISD::VTRUNC from ↵	Craig Topper	2017-12-28	1	-2/+2
\| \| \| \| \| \| \| \|	LowerZERO_EXTEND_Mask/LowerSIGN_EXTEND_Mask. The truncate will be lowered X86ISD::VTRUNC later. llvm-svn: 321534
*	[X86] Remove unnecessary patterns for sign extending vXi1 without VLX.	Craig Topper	2017-12-28	1	-16/+0
\| \| \| \| \| \|	The custom lowering already widens the result type to 512-bits if VLX isn't supported. llvm-svn: 321533
*	[WinEH] Don't emit state stores or EH thunks for available_externally functions	Reid Kleckner	2017-12-28	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	The exception handler thunk needs to reference the LSDA of the parent function, which won't be emitted if it's available_externally. Fixes PR35736. ThinLTO ends up producing available_externally functions that use _CxxFrameHandler3. llvm-svn: 321532
*	Avoid int to string conversion in Twine or raw_ostream contexts.	Benjamin Kramer	2017-12-28	3	-10/+4
\| \| \| \| \| \|	Some output changes from uppercase hex to lowercase hex, no other functionality change intended. llvm-svn: 321526
*	[X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros	Simon Pilgrim	2017-12-28	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow. The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend. Differential Revision: https://reviews.llvm.org/D41484 llvm-svn: 321516
*	[X86] Add CLWB to icelake.	Craig Topper	2017-12-27	1	-1/+2
\| \| \| \| \| \|	Per Table 1-1 in October 2017 edition of Intel® Architecture Instruction Set Extensions and Future Features llvm-svn: 321501
*	[X86] Reimplement r321437 using custom lowering instead of as a DAG combine.	Craig Topper	2017-12-27	1	-43/+6
\| \| \| \| \| \| \| \|	My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues. So just do it in LowerMUL where we can catch more cases. llvm-svn: 321496
*	[AArch64] Change order of candidate FMLS patterns	Matthew Simpson	2017-12-27	1	-22/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r319980 added new patterns to the machine combiner for transforming (fsub (fmul x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source operand is an fmul are transformed. We previously only matched the case where the second source operand of an fsub was an fmul, transforming (fsub z (fmul x y)) into (fmls z x y). Now, if we have an fsub where both source operands are fmuls, both of the above patterns are applicable. However, the order in which we add the patterns to the list of candidates determines the transformation that takes place, since only the first pattern that matches will be used. This patch changes the order these two patterns are added to the list of candidates such that we prefer the case where the second source operand is an fmul (the fmls case), rather than the other one (the fmla/fneg case). When both source operands are fmuls, this ordering results in fewer instructions. Differential Revision: https://reviews.llvm.org/D41587 llvm-svn: 321491
*	[X86] Fix vmul combine for AVX1 targets.	Benjamin Kramer	2017-12-27	1	-0/+4
\| \| \| \| \| \|	v8i32 is legal von AVX1, but it doesn't have pmuludq for it. llvm-svn: 321490
*	[X86] Return SDValue(N, 0) instead of an SDValue() after a successful combine.	Craig Topper	2017-12-26	1	-2/+2
\| \| \| \| \| \| \| \|	Returning SDValue() means nothing changed, SDValue(N,0) means there was a change but the worklist management was taken care of. I don't know if this has a real effect other than making sure the combine counter in the DAG combiner gets updated, but it is the correct thing to do. llvm-svn: 321463
*	It's a fix for Bug 35741 - can't use comments after x86 prefixes.	Andrew V. Tischenko	2017-12-26	1	-2/+3
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D41579 llvm-svn: 321459
*	[X86] Pass itins.rr/itins.rm through properly for some instructions.	Craig Topper	2017-12-26	1	-11/+12
\| \| \| \|	llvm-svn: 321452
*	[X86] Use SSE_INTMUL_ITINS_P for the AVX-512 MUL instructions to match their ↵	Craig Topper	2017-12-26	1	-5/+5
\| \| \| \| \| \|	SSE/AVX counterparts. llvm-svn: 321451
*	[X86] Fix typo in assert message.	Craig Topper	2017-12-26	1	-1/+1
\| \| \| \|	llvm-svn: 321450
*	[X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the ↵	Craig Topper	2017-12-25	1	-0/+34
\| \| \| \| \| \| \| \| \| \|	upper bits are all sign bits or zeros. Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ. This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop. llvm-svn: 321437
*	[X86] Make some helper methods static functions instead. NFC	Craig Topper	2017-12-25	2	-19/+15
\| \| \| \|	llvm-svn: 321433
*	[X86] Use SelectionDAG::getFPExtendOrRound to simplify some code.	Craig Topper	2017-12-25	1	-10/+1
\| \| \| \|	llvm-svn: 321432
*	Make helpers static. No functionality change.	Benjamin Kramer	2017-12-24	1	-1/+9
\| \| \| \|	llvm-svn: 321425
*	[X86][X87] Mark pseudo memory fold instructions as load/sideeffects ↵	Simon Pilgrim	2017-12-24	1	-4/+2
\| \| \| \| \| \| \| \|	(PR21160, PR34080, PR34454). Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass. llvm-svn: 321424