bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU: Replace shrunk instruction with dummy implicit_def	Matt Arsenault	2019-05-03	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891
*	[X86] Remove repeated variables. NFCI.	Simon Pilgrim	2019-05-03	1	-2/+0
\| \| \| \|	llvm-svn: 359889
*	Avoid cppcheck operator precedence warnings. NFCI.	Simon Pilgrim	2019-05-03	4	-5/+5
\| \| \| \| \| \|	Prefer ((X & Y) ? A : B) to (X & Y ? A : B) llvm-svn: 359884
*	AMDGPU: Fix incorrect commute with sub when folding immediates	Matt Arsenault	2019-05-03	1	-1/+4
\| \| \| \| \| \| \| \| \|	When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883
*	[X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI.	Simon Pilgrim	2019-05-03	1	-5/+2
\| \| \| \| \| \|	Leftover from before we had the extract128BitVector helpers. llvm-svn: 359871
*	Reduce variable scope to just the if() block its actually used in. NFCI.	Simon Pilgrim	2019-05-03	1	-2/+1
\| \| \| \|	llvm-svn: 359869
*	[X86] Add more one checks to masked compare patterns that were missed in ↵	Craig Topper	2019-05-03	1	-46/+48
\| \| \| \| \| \| \| \| \|	r358358. This covers the patterns we use for widening 128/256 comparisons to 512-bit when AVX512VL isn't supported. llvm-svn: 359863
*	[AArch64][MC] Reject "add x0, x1, w2, lsl #1" etc.	Eli Friedman	2019-05-03	1	-3/+5
\| \| \| \| \| \| \| \| \| \|	Looks like just a minor oversight in the parsing code. Fixes https://bugs.llvm.org/show_bug.cgi?id=41504. Differential Revision: https://reviews.llvm.org/D60840 llvm-svn: 359855
*	[X86] Remove LEA16r references from X86FixupLEAs. NFCI	Craig Topper	2019-05-02	1	-9/+2
\| \| \| \| \| \|	As far as I know, we never emit LEA16r llvm-svn: 359840
*	[X86] Correct the register class for specific mask register constraints in ↵	Craig Topper	2019-05-02	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getRegForInlineAsmConstraint when the VT is a scalar type The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes. Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying. This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg. Fixes PR41678 Differential Revision: https://reviews.llvm.org/D61453 llvm-svn: 359837
*	[AArch64] Update for Exynos	Evandro Menezes	2019-05-02	3	-82/+18
\| \| \| \| \| \|	Fix the forwarding of multiplication results for Exynos M4. llvm-svn: 359834
*	[X86] Remove string literal from an if. NFC	Craig Topper	2019-05-02	1	-2/+1
\| \| \| \| \| \| \| \|	This if used to be an assert that got refactored into an if, but left the string literal behind. Fixes PR41718 llvm-svn: 359833
*	[SelectionDAG] remove constant folding limitations based on FP exceptions	Sanjay Patel	2019-05-02	2	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791
*	[X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold	Simon Pilgrim	2019-05-02	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \|	Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway. Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result: https://godbolt.org/z/0R-U-K Differential Revision: https://reviews.llvm.org/D61426 llvm-svn: 359786
*	[X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI.	Simon Pilgrim	2019-05-02	1	-13/+15
\| \| \| \| \| \|	Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921 llvm-svn: 359782
*	[ARM GlobalISel] Fixup r359768	Diana Picus	2019-05-02	1	-2/+1
\| \| \| \| \| \|	Get rid of local variable used only in assertion. llvm-svn: 359772
*	[ARM GlobalISel] Select extensions to < 32 bits	Diana Picus	2019-05-02	1	-5/+2
\| \| \| \| \| \| \| \| \|	Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in the exact same way as 32 bits. This overwrites the higher bits, but that should be ok since all legal users of types smaller than 32 bits ignore those bits anyway. llvm-svn: 359768
*	[ARM GlobalISel] Legalize extensions to < 32 bits	Diana Picus	2019-05-02	1	-1/+1
\| \| \| \| \| \|	Make it legal to extend from e.g. s1 to s8 or s16. llvm-svn: 359766
*	[NFC][PowerPC] Return early if the element type is not byte-sized in ↵	Kang Zhang	2019-05-02	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	combineBVOfConsecutiveLoads Summary: Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61076 llvm-svn: 359764
*	[AMDGPU] gfx1010 lost VOP2 forms of some add/sub	Stanislav Mekhanoshin	2019-05-02	1	-0/+27
\| \| \| \| \| \| \| \|	Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757
*	[AMDGPU] gfx1010 allows VOP3 to have a literal	Stanislav Mekhanoshin	2019-05-02	7	-60/+133
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756
*	[AMDGPU] gfx1010 constant bus limit	Stanislav Mekhanoshin	2019-05-02	4	-24/+136
\| \| \| \| \| \| \| \|	Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754
*	[X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variant	Craig Topper	2019-05-02	1	-9/+9
\| \| \| \| \| \| \| \| \| \|	The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D61295 llvm-svn: 359753
*	[GlobalISel][AArch64] Use fmov for G_FCONSTANT when possible	Jessica Paquette	2019-05-01	1	-2/+46
\| \| \| \| \| \| \| \| \| \|	This adds support for using fmov rather than a standard mov to materialize G_FCONSTANT when it's safe to do so. Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the selection is correct. llvm-svn: 359734
*	[X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions	Simon Pilgrim	2019-05-01	1	-6/+11
\| \| \| \| \| \| \| \|	We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector). Differential Revision: https://reviews.llvm.org/D61263 llvm-svn: 359707
*	[AMDGPU] gfx1010 GCNRegBankReassign pass	Stanislav Mekhanoshin	2019-05-01	4	-0/+803
\| \| \| \| \| \| \| \|	Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704
*	[AMDGPU] gfx1010 GCNNSAReassign pass	Stanislav Mekhanoshin	2019-05-01	4	-0/+362
\| \| \| \| \| \| \| \|	Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 llvm-svn: 359700
*	[AMDGPU] gfx1010 MIMG implementation	Stanislav Mekhanoshin	2019-05-01	12	-161/+922
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698
*	[AMDGPU] gfx1010 DS implementation	Stanislav Mekhanoshin	2019-05-01	3	-165/+221
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696
*	Fix 80 column violation. NFCI.	Simon Pilgrim	2019-05-01	1	-5/+6
\| \| \| \|	llvm-svn: 359694
*	[X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQ	Simon Pilgrim	2019-05-01	1	-3/+24
\| \| \| \| \| \|	Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode llvm-svn: 359686
*	[X86][SSE] Add SSE vector shift support to ↵	Simon Pilgrim	2019-05-01	1	-0/+21
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359680
*	[X86][SSE] Split 512-bit -> 128-bit vector directly in ↵	Simon Pilgrim	2019-05-01	1	-1/+4
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode llvm-svn: 359678
*	[X86][SSE] Add 512-bit vector support to ↵	Simon Pilgrim	2019-05-01	1	-8/+15
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359677
*	[X86][SSE] Add X86ISD::PACKSS\PACKUS to ↵	Simon Pilgrim	2019-05-01	1	-1/+7
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359673
*	[X86][SSE] Add X86ISD::UNPCKL\UNPCK to ↵	Simon Pilgrim	2019-05-01	1	-2/+4
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359670
*	[X86][SSE] Move extract_subvector(pshufb) fold to ↵	Simon Pilgrim	2019-05-01	1	-12/+3
\| \| \| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode This lets us hit more cases than combineExtractSubvector and allows us reuse more code. llvm-svn: 359669
*	[X86] SimplifyDemandedVectorEltsForTargetNode - pull out vector halving ↵	Simon Pilgrim	2019-05-01	1	-10/+13
\| \| \| \| \| \| \| \|	code. NFCI. Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes. llvm-svn: 359667
*	[X86][SSE] Extract i1 elements from vXi1 bool vectors	Simon Pilgrim	2019-05-01	1	-0/+33
\| \| \| \| \| \| \| \|	This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK. Differential Revision: https://reviews.llvm.org/D61189 llvm-svn: 359666
*	[X86FixupLEAs] Hoist the calls to isLEA out of the 3 separate functions and ↵	Craig Topper	2019-05-01	1	-14/+9
\| \| \| \| \| \| \| \|	put it in the basic block instruction loop. NFC Now need to check it 3 different times. Just do it once at the top of the loop. llvm-svn: 359658
*	Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract ↵	David L. Jones	2019-05-01	1	-29/+0
\| \| \| \| \| \| \| \|	element" This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313. llvm-svn: 359648
*	[WebAssembly] Update expectations for gcc torture tests	Sam Clegg	2019-04-30	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	This is needed to make the wasm waterfall green again after we land the update to WASI: https://github.com/WebAssembly/waterfall/pull/492 Differential Revision: https://reviews.llvm.org/D61351 llvm-svn: 359634
*	[AMDGPU] gfx1010 VMEM and SMEM implementation	Stanislav Mekhanoshin	2019-04-30	16	-317/+1071
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
*	[X86][SSE] Fold extract_subvector(extend(x)) -> extend_vector_inreg(x)	Simon Pilgrim	2019-04-30	1	-5/+7
\| \| \| \| \| \| \| \|	This adds any extend support - folding to zero_extend_vector_inreg (PMOVZX) for legality Minor improvement for PR39709 llvm-svn: 359608
*	[WebAssembly] Support f16 libcalls	Dan Gohman	2019-04-30	2	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \|	Add support for f16 libcalls in WebAssembly. This entails adding signatures for the remaining F16 libcalls, and renaming gnu_f2h_ieee/gnu_h2f_ieee to truncsfhf2/extendhfsf2 for consistency between f32 and f64/f128 (compiler-rt already supports this). Differential Revision: https://reviews.llvm.org/D61287 Reviewer: dschuff llvm-svn: 359600
*	[X86] Remove if that's always true	Craig Topper	2019-04-30	1	-2/+1
\| \| \| \| \| \| \| \|	It's been like this since it was added in a refactor of this code. Fixes PR41659 llvm-svn: 359597
*	[X86] If PreprocessISelDAG reorders a load before a call, make sure we ↵	Craig Topper	2019-04-30	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	remove dead nodes from the graph The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue. This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it. Differential Revision: https://reviews.llvm.org/D61164 llvm-svn: 359582
*	[X86] Initial cleanups on the FixupLEAs pass. Separate Atom LEA creation ↵	Craig Topper	2019-04-30	1	-91/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	from other LEA optimizations. This removes some of the class variables. Merge basic block processing into runOnMachineFunction to keep the flags local. Pass MachineBasicBlock around instead of an iterator. We can get the iterator in the few places that need it. Allows a range-based outer for loop. Separate the Atom optimization from the rest of the optimizations. This allows fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA when profitable by its heuristics. I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index register is equal to the destination register. This is profitable regardless of the various slow flags. But again we would want Atom to be able to undo that. Differential Revision: https://reviews.llvm.org/D60993 llvm-svn: 359581
*	[ARM] Implement TTI::getMemcpyCost	Sjoerd Meijer	2019-04-30	2	-0/+37
\| \| \| \| \| \| \| \| \|	This implements TargetTransformInfo method getMemcpyCost, which estimates the number of instructions to which a memcpy instruction expands to. Differential Revision: https://reviews.llvm.org/D59787 llvm-svn: 359547
*	Fix for bug 41512: lower INSERT_VECTOR_ELT(ZeroVec, 0, Elt) to ↵	Simon Pilgrim	2019-04-30	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SCALAR_TO_VECTOR(Elt) for all SSE flavors Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd. INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32. So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4. https://bugs.llvm.org/show_bug.cgi?id=41512 Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike. Committed on behalf of @Serge_Preis (Serge Preis) Differential Revision: https://reviews.llvm.org/D60852 llvm-svn: 359545