summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] gfx1010 loop alignmentStanislav Mekhanoshin2019-05-032-0/+78
| | | | | | Differential Revision: https://reviews.llvm.org/D61529 llvm-svn: 359935
* [COFF, ARM64] Fix ABI implementation of struct returnsMandeep Singh Grang2019-05-033-2/+76
| | | | | | | | | | | | | | | | | | | Summary: Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values Related clang patch: D60349 Reviewers: rnk, efriedma, TomTan, ssijaric Reviewed By: rnk, efriedma Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60348 llvm-svn: 359934
* [hexagon] change AsmParser assertion to errorBrian Cain2019-05-031-10/+10
| | | | | | | For immediates that can't be evaluated in assembler-mapped instructions, we should return 'invalid operand' instead of assert. llvm-svn: 359905
* [X86] Allow assembly parser to accept x/y/z suffixes on non-memory ↵Craig Topper2019-05-031-5/+26
| | | | | | | | | | | | vfpclassps/pd and on memory forms in intel syntax The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned. But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas. The printing code will still only use the suffix for the memory form where it is needed. llvm-svn: 359903
* [X86] LowerToHorizontalOp - Tidyup calls to getHopForBuildVector. NFCI.Simon Pilgrim2019-05-031-15/+7
| | | | | | Merge the if() tests for the various HADD/SUB + Subtarget tests llvm-svn: 359901
* AMDGPU: Select VOP3 form of subMatt Arsenault2019-05-031-2/+2
| | | | | | | | | | The VOP3 form should always be the preferred selection form to be shrunk later. The r600 sub test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 359899
* AMDGPU: Support shrinking add with FI in SIFoldOperandsMatt Arsenault2019-05-031-35/+37
| | | | | | Avoids test regression in a future patch llvm-svn: 359898
* AMDGPU: Remove redundant patterns for shiftsMatt Arsenault2019-05-031-9/+4
| | | | llvm-svn: 359895
* AMDGPU: Remove redundant patterns for subMatt Arsenault2019-05-031-4/+0
| | | | | | | There were 2 patterns for sub, one selecting to sub and one to subrev. Only one of these will succeed, so remove the reversed one. llvm-svn: 359894
* AMDGPU: Replace shrunk instruction with dummy implicit_defMatt Arsenault2019-05-031-4/+8
| | | | | | | | | | | | This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891
* [X86] Remove repeated variables. NFCI.Simon Pilgrim2019-05-031-2/+0
| | | | llvm-svn: 359889
* Avoid cppcheck operator precedence warnings. NFCI.Simon Pilgrim2019-05-034-5/+5
| | | | | | Prefer ((X & Y) ? A : B) to (X & Y ? A : B) llvm-svn: 359884
* AMDGPU: Fix incorrect commute with sub when folding immediatesMatt Arsenault2019-05-031-1/+4
| | | | | | | | | When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883
* [X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI.Simon Pilgrim2019-05-031-5/+2
| | | | | | Leftover from before we had the extract128BitVector helpers. llvm-svn: 359871
* Reduce variable scope to just the if() block its actually used in. NFCI.Simon Pilgrim2019-05-031-2/+1
| | | | llvm-svn: 359869
* [X86] Add more one checks to masked compare patterns that were missed in ↵Craig Topper2019-05-031-46/+48
| | | | | | | | | r358358. This covers the patterns we use for widening 128/256 comparisons to 512-bit when AVX512VL isn't supported. llvm-svn: 359863
* [AArch64][MC] Reject "add x0, x1, w2, lsl #1" etc.Eli Friedman2019-05-031-3/+5
| | | | | | | | | | Looks like just a minor oversight in the parsing code. Fixes https://bugs.llvm.org/show_bug.cgi?id=41504. Differential Revision: https://reviews.llvm.org/D60840 llvm-svn: 359855
* [X86] Remove LEA16r references from X86FixupLEAs. NFCICraig Topper2019-05-021-9/+2
| | | | | | As far as I know, we never emit LEA16r llvm-svn: 359840
* [X86] Correct the register class for specific mask register constraints in ↵Craig Topper2019-05-021-0/+28
| | | | | | | | | | | | | | | | getRegForInlineAsmConstraint when the VT is a scalar type The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes. Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying. This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg. Fixes PR41678 Differential Revision: https://reviews.llvm.org/D61453 llvm-svn: 359837
* [AArch64] Update for ExynosEvandro Menezes2019-05-023-82/+18
| | | | | | Fix the forwarding of multiplication results for Exynos M4. llvm-svn: 359834
* [X86] Remove string literal from an if. NFCCraig Topper2019-05-021-2/+1
| | | | | | | | This if used to be an assert that got refactored into an if, but left the string literal behind. Fixes PR41718 llvm-svn: 359833
* [SelectionDAG] remove constant folding limitations based on FP exceptionsSanjay Patel2019-05-022-8/+0
| | | | | | | | | | | | | | | | | We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791
* [X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+foldSimon Pilgrim2019-05-021-6/+5
| | | | | | | | | | | Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway. Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result: https://godbolt.org/z/0R-U-K Differential Revision: https://reviews.llvm.org/D61426 llvm-svn: 359786
* [X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI.Simon Pilgrim2019-05-021-13/+15
| | | | | | Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921 llvm-svn: 359782
* [ARM GlobalISel] Fixup r359768Diana Picus2019-05-021-2/+1
| | | | | | Get rid of local variable used only in assertion. llvm-svn: 359772
* [ARM GlobalISel] Select extensions to < 32 bitsDiana Picus2019-05-021-5/+2
| | | | | | | | | Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in the exact same way as 32 bits. This overwrites the higher bits, but that should be ok since all legal users of types smaller than 32 bits ignore those bits anyway. llvm-svn: 359768
* [ARM GlobalISel] Legalize extensions to < 32 bitsDiana Picus2019-05-021-1/+1
| | | | | | Make it legal to extend from e.g. s1 to s8 or s16. llvm-svn: 359766
* [NFC][PowerPC] Return early if the element type is not byte-sized in ↵Kang Zhang2019-05-021-0/+5
| | | | | | | | | | | | combineBVOfConsecutiveLoads Summary: Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61076 llvm-svn: 359764
* [AMDGPU] gfx1010 lost VOP2 forms of some add/subStanislav Mekhanoshin2019-05-021-0/+27
| | | | | | | | Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757
* [AMDGPU] gfx1010 allows VOP3 to have a literalStanislav Mekhanoshin2019-05-027-60/+133
| | | | | | Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756
* [AMDGPU] gfx1010 constant bus limitStanislav Mekhanoshin2019-05-024-24/+136
| | | | | | | | Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754
* [X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variantCraig Topper2019-05-021-9/+9
| | | | | | | | | | The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D61295 llvm-svn: 359753
* [GlobalISel][AArch64] Use fmov for G_FCONSTANT when possibleJessica Paquette2019-05-011-2/+46
| | | | | | | | | | This adds support for using fmov rather than a standard mov to materialize G_FCONSTANT when it's safe to do so. Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the selection is correct. llvm-svn: 359734
* [X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractionsSimon Pilgrim2019-05-011-6/+11
| | | | | | | | We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector). Differential Revision: https://reviews.llvm.org/D61263 llvm-svn: 359707
* [AMDGPU] gfx1010 GCNRegBankReassign passStanislav Mekhanoshin2019-05-014-0/+803
| | | | | | | | Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704
* [AMDGPU] gfx1010 GCNNSAReassign passStanislav Mekhanoshin2019-05-014-0/+362
| | | | | | | | Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 llvm-svn: 359700
* [AMDGPU] gfx1010 MIMG implementationStanislav Mekhanoshin2019-05-0112-161/+922
| | | | | | Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698
* [AMDGPU] gfx1010 DS implementationStanislav Mekhanoshin2019-05-013-165/+221
| | | | | | Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696
* Fix 80 column violation. NFCI.Simon Pilgrim2019-05-011-5/+6
| | | | llvm-svn: 359694
* [X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQSimon Pilgrim2019-05-011-3/+24
| | | | | | Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode llvm-svn: 359686
* [X86][SSE] Add SSE vector shift support to ↵Simon Pilgrim2019-05-011-0/+21
| | | | | | SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359680
* [X86][SSE] Split 512-bit -> 128-bit vector directly in ↵Simon Pilgrim2019-05-011-1/+4
| | | | | | SimplifyDemandedVectorEltsForTargetNode llvm-svn: 359678
* [X86][SSE] Add 512-bit vector support to ↵Simon Pilgrim2019-05-011-8/+15
| | | | | | SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359677
* [X86][SSE] Add X86ISD::PACKSS\PACKUS to ↵Simon Pilgrim2019-05-011-1/+7
| | | | | | SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359673
* [X86][SSE] Add X86ISD::UNPCKL\UNPCK to ↵Simon Pilgrim2019-05-011-2/+4
| | | | | | SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359670
* [X86][SSE] Move extract_subvector(pshufb) fold to ↵Simon Pilgrim2019-05-011-12/+3
| | | | | | | | SimplifyDemandedVectorEltsForTargetNode This lets us hit more cases than combineExtractSubvector and allows us reuse more code. llvm-svn: 359669
* [X86] SimplifyDemandedVectorEltsForTargetNode - pull out vector halving ↵Simon Pilgrim2019-05-011-10/+13
| | | | | | | | code. NFCI. Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes. llvm-svn: 359667
* [X86][SSE] Extract i1 elements from vXi1 bool vectorsSimon Pilgrim2019-05-011-0/+33
| | | | | | | | This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK. Differential Revision: https://reviews.llvm.org/D61189 llvm-svn: 359666
* [X86FixupLEAs] Hoist the calls to isLEA out of the 3 separate functions and ↵Craig Topper2019-05-011-14/+9
| | | | | | | | put it in the basic block instruction loop. NFC Now need to check it 3 different times. Just do it once at the top of the loop. llvm-svn: 359658
* Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract ↵David L. Jones2019-05-011-29/+0
| | | | | | | | element" This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313. llvm-svn: 359648
OpenPOWER on IntegriCloud