summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Allow scalar s1 and/or/xorMatt Arsenault2019-07-151-6/+91
| | | | | | | | If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to whether the result is 0. If the inputs are SCC, these can be copied to a 32-bit SGPR to produce an SCC result. llvm-svn: 366125
* AMDGPU/GlobalISel: Select G_AND/G_OR/G_XORMatt Arsenault2019-07-152-0/+66
| | | | llvm-svn: 366121
* AMDGPU/GlobalISel: Don't constrain source register of VCC copiesMatt Arsenault2019-07-151-0/+20
| | | | | | | | | | | | | This is a hack until I come up with a better way of dealing with the pseudo-register banks used for boolean values. If the use instruction constrains the register, the selector for the def instruction won't see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have been SCCRegBank or VCCRegBank in wave32. This is necessary to successfully select branches with and and/or/xor condition. llvm-svn: 366120
* AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copiesMatt Arsenault2019-07-151-10/+12
| | | | | | | | | The extra test change is correct, although how it arrives there is a bug that needs work. With wave32, the test for isVCC ambiguously reports true for an SCC or VCC source. A new allocatable pseudo register class for SCC may be necesssary. llvm-svn: 366119
* AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCCMatt Arsenault2019-07-151-0/+4
| | | | llvm-svn: 366118
* AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCCMatt Arsenault2019-07-151-17/+23
| | | | | | This was emitting a copy from a 32-bit register to a 64-bit. llvm-svn: 366117
* AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELTMatt Arsenault2019-07-152-1/+33
| | | | llvm-svn: 366116
* AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELTMatt Arsenault2019-07-152-1/+36
| | | | | | Turn the constant cases into G_EXTRACTs. llvm-svn: 366115
* AMDGPU/GlobalISel: Fix G_ICMP for wave32Matt Arsenault2019-07-151-2/+2
| | | | llvm-svn: 366114
* AMDGPU/GlobalISel: Widen vector extractsMatt Arsenault2019-07-151-5/+8
| | | | llvm-svn: 366103
* AMDGPU/GlobalISel: Handle llvm.amdgcn.if.breakMatt Arsenault2019-07-152-0/+32
| | | | llvm-svn: 366102
* AMDGPU/GlobalISel: Select llvm.amdgcn.end.cfMatt Arsenault2019-07-152-0/+19
| | | | llvm-svn: 366099
* AMDGPU: Add 24-bit mul intrinsicsMatt Arsenault2019-07-152-0/+132
| | | | | | | | | | | Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it. llvm-svn: 366094
* [AMDGPU] Copy missing predicate from pseudo to realStanislav Mekhanoshin2019-07-151-0/+1
| | | | | | | | NFC at the momemnt, needed for future commit. Differential Revision: https://reviews.llvm.org/D64761 llvm-svn: 366092
* AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTORMatt Arsenault2019-07-151-0/+4
| | | | llvm-svn: 366087
* AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORSMatt Arsenault2019-07-151-1/+2
| | | | llvm-svn: 366086
* [AMDGPU] fixed scheduler crash in gfx908Stanislav Mekhanoshin2019-07-151-2/+2
| | | | | | | | | For some reason scheduler can send down an SUnit without an instruction. Differential Revision: https://reviews.llvm.org/D64709 llvm-svn: 366074
* [AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL messageDmitry Preobrazhensky2019-07-153-4/+10
| | | | | | | | Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64729 llvm-svn: 366071
* [AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructionsDmitry Preobrazhensky2019-07-151-3/+5
| | | | | | | | | | See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64716 llvm-svn: 366067
* Remove set but unused variable.Bill Wendling2019-07-151-5/+1
| | | | llvm-svn: 366041
* [AMDGPU] use v32f32 for 3 mfma intrinsicsStanislav Mekhanoshin2019-07-125-10/+32
| | | | | | | | | These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972
* AMDGPU: Drop remnants of byval support for shadersMatt Arsenault2019-07-121-2/+1
| | | | | | | | Before 2018, mesa used to use byval interchangably with inreg, which didn't really make sense. Fix tests still using it to avoid breaking in a future commit. llvm-svn: 365953
* Fix missing use of defined() in include guardDavid Tenty2019-07-121-1/+1
| | | | | | | | | | Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64657 llvm-svn: 365952
* [AMDGPU] Extend MIMG opcode to 8 bitsStanislav Mekhanoshin2019-07-122-22/+28
| | | | | | | | This is NFC, but required for future commit. Differential Revision: https://reviews.llvm.org/D64649 llvm-svn: 365940
* [AMDGPU] Fix DPP combiner check for exec modificationJay Foad2019-07-124-29/+63
| | | | | | | | | | | | | | | | | | | | | | | | Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910
* [AMDGPU] Restrict v_cndmask_b32 abs/neg modifiers to f32Jay Foad2019-07-123-2/+14
| | | | | | | | | | | | | | | | | | Summary: D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't make any sense to apply them to f16 operands; they would interpret the bits of the value as an f32, giving nonsensical results. This patch restricts them to f32 operands. Reviewers: arsenm, hakzsam Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64636 llvm-svn: 365904
* Delete dead storesFangrui Song2019-07-123-10/+6
| | | | llvm-svn: 365903
* [AMDGPU] Skip calculating callee saved registers for entry function.Michael Liao2019-07-111-1/+5
| | | | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64596 llvm-svn: 365846
* AMDGPU: s_waitcnt field should be treated as unsignedMatt Arsenault2019-07-112-3/+7
| | | | | | | Also make it an ImmLeaf, so it should work with global isel as well, which was part of the point of moving it in the first place. llvm-svn: 365842
* [AMDGPU] Fixed asan error with agpr spillingStanislav Mekhanoshin2019-07-111-1/+4
| | | | | | Instruction was used after it was erased. llvm-svn: 365837
* [AMDGPU] gfx908 agpr spillingStanislav Mekhanoshin2019-07-117-45/+367
| | | | | | Differential Revision: https://reviews.llvm.org/D64594 llvm-svn: 365833
* [AMDGPU] gfx908 hazard recognizerStanislav Mekhanoshin2019-07-112-1/+233
| | | | | | Differential Revision: https://reviews.llvm.org/D64593 llvm-svn: 365829
* [AMDGPU] gfx908 schedulingStanislav Mekhanoshin2019-07-113-0/+163
| | | | | | Differential Revision: https://reviews.llvm.org/D64590 llvm-svn: 365826
* [AMDGPU] gfx908 mfma supportStanislav Mekhanoshin2019-07-1116-62/+548
| | | | | | Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
* AMDGPU/GlobalISel: Move kernel argument handling to separate functionMatt Arsenault2019-07-112-42/+61
| | | | llvm-svn: 365782
* Remove some redundant code from r290372 and improve a comment.Jay Foad2019-07-111-5/+3
| | | | llvm-svn: 365741
* [AMDGPU] gfx908 atomic fadd and atomic pk_faddStanislav Mekhanoshin2019-07-117-4/+195
| | | | | | Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717
* [AMDGPU] gfx908 dot instruction supportStanislav Mekhanoshin2019-07-111-0/+30
| | | | | | Differential Revision: https://reviews.llvm.org/D64431 llvm-svn: 365715
* GlobalISel: Legalization for G_FMINNUM/G_FMAXNUMMatt Arsenault2019-07-102-1/+57
| | | | llvm-svn: 365658
* AMDGPU: Serialize mode from MachineFunctionInfoMatt Arsenault2019-07-103-1/+32
| | | | llvm-svn: 365653
* [AMDGPU] Allow abs/neg source modifiers on v_cndmask_b32Jay Foad2019-07-101-7/+8
| | | | | | | | | | | | | | | | | Summary: D59191 added support for these modifiers in the assembler and disassembler. This patch just teaches instruction selection that it can use them. Reviewers: arsenm, tstellar Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64497 llvm-svn: 365640
* AMDGPU/GlobalISel: Add support for wide loads >= 256-bitsTom Stellard2019-07-104-37/+219
| | | | | | | | | | | | | | | | | | Summary: This adds support for the most commonly used wide load types: <8xi32>, <16xi32>, <4xi64>, and <8xi64> Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57399 llvm-svn: 365586
* GlobalISel: Implement lower for G_FCOPYSIGNMatt Arsenault2019-07-091-3/+2
| | | | | | | | | In SelectionDAG AMDGPU treated these as legal, but this was mostly because the bitcasts required for FP types were painful. Theoretically the bitpattern should eventually match to bfi, so don't bother trying to get the patterns to import. llvm-svn: 365583
* AMDGPU/GlobalISel: Fix legality for G_BUILD_VECTORMatt Arsenault2019-07-091-7/+4
| | | | llvm-svn: 365575
* [AMDGPU] gfx908 v_pk_fmac_f16 supportStanislav Mekhanoshin2019-07-092-4/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D64433 llvm-svn: 365573
* [AMDGPU] gfx908 mAI instructions, MC partStanislav Mekhanoshin2019-07-0919-18/+674
| | | | | | Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563
* [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into ↵Craig Topper2019-07-092-5/+11
| | | | | | | | | | | | | | | | isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549
* [AMDGPU] gfx908 register file changesStanislav Mekhanoshin2019-07-096-50/+621
| | | | | | Differential Revision: https://reviews.llvm.org/D64438 llvm-svn: 365546
* [AMDGPU] gfx908 targetStanislav Mekhanoshin2019-07-095-0/+100
| | | | | | Differential Revision: https://reviews.llvm.org/D64429 llvm-svn: 365525
* [AMDGPU] Created a sub-register class for the return address operand in the ↵Christudasan Devadasan2019-07-093-12/+15
| | | | | | | | | | | | | | return instruction. Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class exclusive of the CSRs, and used this regclass while lowering the return instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D63924 llvm-svn: 365512
OpenPOWER on IntegriCloud