bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[AMDGPU] gfx1010 loop alignment	Stanislav Mekhanoshin	2019-05-03	2	-0/+78
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61529 llvm-svn: 359935
*	[COFF, ARM64] Fix ABI implementation of struct returns	Mandeep Singh Grang	2019-05-03	3	-2/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values Related clang patch: D60349 Reviewers: rnk, efriedma, TomTan, ssijaric Reviewed By: rnk, efriedma Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60348 llvm-svn: 359934
*	[hexagon] change AsmParser assertion to error	Brian Cain	2019-05-03	1	-10/+10
\| \| \| \| \| \| \|	For immediates that can't be evaluated in assembler-mapped instructions, we should return 'invalid operand' instead of assert. llvm-svn: 359905
*	[X86] Allow assembly parser to accept x/y/z suffixes on non-memory ↵	Craig Topper	2019-05-03	1	-5/+26
\| \| \| \| \| \| \| \| \| \| \| \|	vfpclassps/pd and on memory forms in intel syntax The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned. But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas. The printing code will still only use the suffix for the memory form where it is needed. llvm-svn: 359903
*	[X86] LowerToHorizontalOp - Tidyup calls to getHopForBuildVector. NFCI.	Simon Pilgrim	2019-05-03	1	-15/+7
\| \| \| \| \| \|	Merge the if() tests for the various HADD/SUB + Subtarget tests llvm-svn: 359901
*	AMDGPU: Select VOP3 form of sub	Matt Arsenault	2019-05-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The VOP3 form should always be the preferred selection form to be shrunk later. The r600 sub test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 359899
*	AMDGPU: Support shrinking add with FI in SIFoldOperands	Matt Arsenault	2019-05-03	1	-35/+37
\| \| \| \| \| \|	Avoids test regression in a future patch llvm-svn: 359898
*	AMDGPU: Remove redundant patterns for shifts	Matt Arsenault	2019-05-03	1	-9/+4
\| \| \| \|	llvm-svn: 359895
*	AMDGPU: Remove redundant patterns for sub	Matt Arsenault	2019-05-03	1	-4/+0
\| \| \| \| \| \| \|	There were 2 patterns for sub, one selecting to sub and one to subrev. Only one of these will succeed, so remove the reversed one. llvm-svn: 359894
*	AMDGPU: Replace shrunk instruction with dummy implicit_def	Matt Arsenault	2019-05-03	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891
*	[X86] Remove repeated variables. NFCI.	Simon Pilgrim	2019-05-03	1	-2/+0
\| \| \| \|	llvm-svn: 359889
*	Avoid cppcheck operator precedence warnings. NFCI.	Simon Pilgrim	2019-05-03	4	-5/+5
\| \| \| \| \| \|	Prefer ((X & Y) ? A : B) to (X & Y ? A : B) llvm-svn: 359884
*	AMDGPU: Fix incorrect commute with sub when folding immediates	Matt Arsenault	2019-05-03	1	-1/+4
\| \| \| \| \| \| \| \| \|	When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883
*	[X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI.	Simon Pilgrim	2019-05-03	1	-5/+2
\| \| \| \| \| \|	Leftover from before we had the extract128BitVector helpers. llvm-svn: 359871
*	Reduce variable scope to just the if() block its actually used in. NFCI.	Simon Pilgrim	2019-05-03	1	-2/+1
\| \| \| \|	llvm-svn: 359869
*	[X86] Add more one checks to masked compare patterns that were missed in ↵	Craig Topper	2019-05-03	1	-46/+48
\| \| \| \| \| \| \| \| \|	r358358. This covers the patterns we use for widening 128/256 comparisons to 512-bit when AVX512VL isn't supported. llvm-svn: 359863
*	[AArch64][MC] Reject "add x0, x1, w2, lsl #1" etc.	Eli Friedman	2019-05-03	1	-3/+5
\| \| \| \| \| \| \| \| \| \|	Looks like just a minor oversight in the parsing code. Fixes https://bugs.llvm.org/show_bug.cgi?id=41504. Differential Revision: https://reviews.llvm.org/D60840 llvm-svn: 359855
*	[X86] Remove LEA16r references from X86FixupLEAs. NFCI	Craig Topper	2019-05-02	1	-9/+2
\| \| \| \| \| \|	As far as I know, we never emit LEA16r llvm-svn: 359840
*	[X86] Correct the register class for specific mask register constraints in ↵	Craig Topper	2019-05-02	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getRegForInlineAsmConstraint when the VT is a scalar type The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes. Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying. This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg. Fixes PR41678 Differential Revision: https://reviews.llvm.org/D61453 llvm-svn: 359837
*	[AArch64] Update for Exynos	Evandro Menezes	2019-05-02	3	-82/+18
\| \| \| \| \| \|	Fix the forwarding of multiplication results for Exynos M4. llvm-svn: 359834
*	[X86] Remove string literal from an if. NFC	Craig Topper	2019-05-02	1	-2/+1
\| \| \| \| \| \| \| \|	This if used to be an assert that got refactored into an if, but left the string literal behind. Fixes PR41718 llvm-svn: 359833
*	[SelectionDAG] remove constant folding limitations based on FP exceptions	Sanjay Patel	2019-05-02	2	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791
*	[X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold	Simon Pilgrim	2019-05-02	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \|	Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway. Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result: https://godbolt.org/z/0R-U-K Differential Revision: https://reviews.llvm.org/D61426 llvm-svn: 359786
*	[X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI.	Simon Pilgrim	2019-05-02	1	-13/+15
\| \| \| \| \| \|	Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921 llvm-svn: 359782
*	[ARM GlobalISel] Fixup r359768	Diana Picus	2019-05-02	1	-2/+1
\| \| \| \| \| \|	Get rid of local variable used only in assertion. llvm-svn: 359772
*	[ARM GlobalISel] Select extensions to < 32 bits	Diana Picus	2019-05-02	1	-5/+2
\| \| \| \| \| \| \| \| \|	Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in the exact same way as 32 bits. This overwrites the higher bits, but that should be ok since all legal users of types smaller than 32 bits ignore those bits anyway. llvm-svn: 359768
*	[ARM GlobalISel] Legalize extensions to < 32 bits	Diana Picus	2019-05-02	1	-1/+1
\| \| \| \| \| \|	Make it legal to extend from e.g. s1 to s8 or s16. llvm-svn: 359766
*	[NFC][PowerPC] Return early if the element type is not byte-sized in ↵	Kang Zhang	2019-05-02	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	combineBVOfConsecutiveLoads Summary: Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61076 llvm-svn: 359764
*	[AMDGPU] gfx1010 lost VOP2 forms of some add/sub	Stanislav Mekhanoshin	2019-05-02	1	-0/+27
\| \| \| \| \| \| \| \|	Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757
*	[AMDGPU] gfx1010 allows VOP3 to have a literal	Stanislav Mekhanoshin	2019-05-02	7	-60/+133
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756
*	[AMDGPU] gfx1010 constant bus limit	Stanislav Mekhanoshin	2019-05-02	4	-24/+136
\| \| \| \| \| \| \| \|	Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754
*	[X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variant	Craig Topper	2019-05-02	1	-9/+9
\| \| \| \| \| \| \| \| \| \|	The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D61295 llvm-svn: 359753
*	[GlobalISel][AArch64] Use fmov for G_FCONSTANT when possible	Jessica Paquette	2019-05-01	1	-2/+46
\| \| \| \| \| \| \| \| \| \|	This adds support for using fmov rather than a standard mov to materialize G_FCONSTANT when it's safe to do so. Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the selection is correct. llvm-svn: 359734
*	[X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions	Simon Pilgrim	2019-05-01	1	-6/+11
\| \| \| \| \| \| \| \|	We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector). Differential Revision: https://reviews.llvm.org/D61263 llvm-svn: 359707
*	[AMDGPU] gfx1010 GCNRegBankReassign pass	Stanislav Mekhanoshin	2019-05-01	4	-0/+803
\| \| \| \| \| \| \| \|	Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704
*	[AMDGPU] gfx1010 GCNNSAReassign pass	Stanislav Mekhanoshin	2019-05-01	4	-0/+362
\| \| \| \| \| \| \| \|	Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 llvm-svn: 359700
*	[AMDGPU] gfx1010 MIMG implementation	Stanislav Mekhanoshin	2019-05-01	12	-161/+922
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698
*	[AMDGPU] gfx1010 DS implementation	Stanislav Mekhanoshin	2019-05-01	3	-165/+221
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696
*	Fix 80 column violation. NFCI.	Simon Pilgrim	2019-05-01	1	-5/+6
\| \| \| \|	llvm-svn: 359694
*	[X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQ	Simon Pilgrim	2019-05-01	1	-3/+24
\| \| \| \| \| \|	Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode llvm-svn: 359686
*	[X86][SSE] Add SSE vector shift support to ↵	Simon Pilgrim	2019-05-01	1	-0/+21
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359680
*	[X86][SSE] Split 512-bit -> 128-bit vector directly in ↵	Simon Pilgrim	2019-05-01	1	-1/+4
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode llvm-svn: 359678
*	[X86][SSE] Add 512-bit vector support to ↵	Simon Pilgrim	2019-05-01	1	-8/+15
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359677
*	[X86][SSE] Add X86ISD::PACKSS\PACKUS to ↵	Simon Pilgrim	2019-05-01	1	-1/+7
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359673
*	[X86][SSE] Add X86ISD::UNPCKL\UNPCK to ↵	Simon Pilgrim	2019-05-01	1	-2/+4
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359670
*	[X86][SSE] Move extract_subvector(pshufb) fold to ↵	Simon Pilgrim	2019-05-01	1	-12/+3
\| \| \| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode This lets us hit more cases than combineExtractSubvector and allows us reuse more code. llvm-svn: 359669
*	[X86] SimplifyDemandedVectorEltsForTargetNode - pull out vector halving ↵	Simon Pilgrim	2019-05-01	1	-10/+13
\| \| \| \| \| \| \| \|	code. NFCI. Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes. llvm-svn: 359667
*	[X86][SSE] Extract i1 elements from vXi1 bool vectors	Simon Pilgrim	2019-05-01	1	-0/+33
\| \| \| \| \| \| \| \|	This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK. Differential Revision: https://reviews.llvm.org/D61189 llvm-svn: 359666
*	[X86FixupLEAs] Hoist the calls to isLEA out of the 3 separate functions and ↵	Craig Topper	2019-05-01	1	-14/+9
\| \| \| \| \| \| \| \|	put it in the basic block instruction loop. NFC Now need to check it 3 different times. Just do it once at the top of the loop. llvm-svn: 359658
*	Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract ↵	David L. Jones	2019-05-01	1	-29/+0
\| \| \| \| \| \| \| \|	element" This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313. llvm-svn: 359648