summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstructions.td
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Fix import of integer med3Matt Arsenault2020-01-091-8/+30
| | | | | This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.
* AMDGPU/GlobalISel: Fix add of neg inline constant patternMatt Arsenault2020-01-091-1/+14
|
* AMDGPU/GlobalISel: Fix readfirstlane pattern importMatt Arsenault2020-01-071-1/+1
| | | | | The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.
* AMDGPU/GlobalISel: Partially fix llvm.amdgcn.kill pattern importMatt Arsenault2020-01-071-4/+4
| | | | | Tests deferred since the existing DAG test depends on some other operations, but isn't far from working as-is.
* AMDGPU: Fix legalizing f16 fpowMatt Arsenault2020-01-061-0/+1
| | | | | | The existing test only covered one case for r600. The use of mul_legacy also looks suspicious to me, but leave it for now. The patterns are also not making use of source modifiers.
* AMDGPU: Use ImmLeaf for inline immediate predicatesMatt Arsenault2020-01-061-5/+5
|
* AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTORMatt Arsenault2020-01-061-18/+23
|
* DAG: Use TargetConstant for FENCE operandsMatt Arsenault2020-01-021-1/+1
|
* AMDGPU/GlobalISel: Select llvm.amdgcn.fmad.ftzMatt Arsenault2019-12-301-3/+4
|
* AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHGMatt Arsenault2019-10-251-0/+11
| | | | | | | | Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.
* AMDGPU: Select basic interp directly from intrinsicsMatt Arsenault2019-10-211-6/+6
| | | | llvm-svn: 375457
* [AMDGPU] Support mov dpp with 64 bit operandsStanislav Mekhanoshin2019-10-151-0/+26
| | | | | | | | | | We define mov/update dpp intrinsics as overloaded but do not support i64, which is a practically useful type. Fix the selection and lowering. Differential Revision: https://reviews.llvm.org/D68673 llvm-svn: 374910
* GlobalISel: Add target pre-isel instructionsMatt Arsenault2019-10-071-0/+10
| | | | | | | | | | | | | | Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937
* AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFPMatt Arsenault2019-10-011-6/+6
| | | | llvm-svn: 373298
* AMDGPU/GlobalISel: Add support for init.exec intrinsicsMatt Arsenault2019-10-011-19/+11
| | | | | | | TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-6/+6
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* GlobalISel: Don't materialize immarg arguments to intrinsicsMatt Arsenault2019-09-191-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encode them directly as an imm argument to G_INTRINSIC*. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_* instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
* AMDGPU/GlobalISel: Select S16->S32 fptointMatt Arsenault2019-09-161-2/+2
| | | | llvm-svn: 371950
* AMDGPU/GlobalISel: Select s32->s16 G_[US]ITOFPMatt Arsenault2019-09-161-2/+2
| | | | llvm-svn: 371949
* AMDGPU/GlobalISel: Fix VALU s16 fnegMatt Arsenault2019-09-161-0/+10
| | | | llvm-svn: 371948
* AMDGPU/GlobalISel: Select G_CTPOPMatt Arsenault2019-09-131-0/+6
| | | | llvm-svn: 371798
* DAG/GlobalISel: Correct type profile of bitcount opsMatt Arsenault2019-09-131-1/+1
| | | | | | | | The result integer does not need to be the same width as the input. AMDGPU, NVPTX, and Hexagon all have patterns working around the types matching. GlobalISel defines these as being different type indexes. llvm-svn: 371797
* GlobalISel/TableGen: Handle REG_SEQUENCE patternsMatt Arsenault2019-09-101-14/+36
| | | | | | | | The scalar f64 patterns don't work yet because they fail on multiple results from the unused implicit def of scc in the result bit operation. llvm-svn: 371542
* AMDGPU/GlobalISel: Select G_FABS/G_FNEGMatt Arsenault2019-09-101-48/+82
| | | | | | | | | | | f64 doesn't work yet because tablegen currently doesn't handlde REG_SEQUENCE. This does regress some multi use VALU fneg cases since now the immediate remains in an SGPR, and more moves are used for legalizing the xor. This is a SIFixSGPRCopies deficiency. llvm-svn: 371540
* AMDGPU: Remove pointless wrapper nodes for init.exec intrinsicsMatt Arsenault2019-09-091-7/+6
| | | | llvm-svn: 371364
* Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"Matt Arsenault2019-08-201-1/+0
| | | | | | | This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417
* Reapply "AMDGPU: Split block for si_end_cf"Matt Arsenault2019-08-011-0/+1
| | | | | | This reverts commit r359363, reapplying r357634 llvm-svn: 367500
* [AMDGPU] Add llvm.amdgcn.softwqm intrinsicCarl Ritson2019-07-261-0/+4
| | | | | | | | | | | | | | | | | Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097
* [AMDGPU] use v32f32 for 3 mfma intrinsicsStanislav Mekhanoshin2019-07-121-0/+12
| | | | | | | | | These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972
* [AMDGPU] Restrict v_cndmask_b32 abs/neg modifiers to f32Jay Foad2019-07-121-2/+2
| | | | | | | | | | | | | | | | | | Summary: D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't make any sense to apply them to f16 operands; they would interpret the bits of the value as an f32, giving nonsensical results. This patch restricts them to f32 operands. Reviewers: arsenm, hakzsam Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64636 llvm-svn: 365904
* [AMDGPU] gfx908 agpr spillingStanislav Mekhanoshin2019-07-111-2/+45
| | | | | | Differential Revision: https://reviews.llvm.org/D64594 llvm-svn: 365833
* [AMDGPU] gfx908 mfma supportStanislav Mekhanoshin2019-07-111-0/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
* [AMDGPU] Allow abs/neg source modifiers on v_cndmask_b32Jay Foad2019-07-101-7/+8
| | | | | | | | | | | | | | | | | Summary: D59191 added support for these modifiers in the assembler and disassembler. This patch just teaches instruction selection that it can use them. Reviewers: arsenm, tstellar Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64497 llvm-svn: 365640
* AMDGPU: Correct properties for adjcallstack* pseudosMatt Arsenault2019-07-011-0/+4
| | | | | | | These should be SALU writes, and these are lowered to instructions that def SCC. llvm-svn: 364859
* AMDGPU: Write LDS objects out as global symbols in code generationNicolai Haehnle2019-06-251-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-201-18/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* [AMDGPU] gfx10 wave32 patternsStanislav Mekhanoshin2019-06-181-7/+66
| | | | | | Differential Revision: https://reviews.llvm.org/D63511 llvm-svn: 363729
* AMDGPU: Fold readlane from copy of SGPR or immMatt Arsenault2019-06-181-0/+7
| | | | | | These may be inserted to assert uniformity somewhere. llvm-svn: 363670
* AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0Nicolai Haehnle2019-06-161-1/+6
| | | | | | | | | | | | | | | | | | | | | Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516
* [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32Stanislav Mekhanoshin2019-06-131-1/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339
* [AMDGPU] gfx1010 base changes for wave32Stanislav Mekhanoshin2019-06-131-0/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D63293 llvm-svn: 363299
* AMDGPU: Assume call pseudos are convergentMatt Arsenault2019-05-211-0/+6
| | | | | | | There should probably be nonconvergent versions, but my guess is it doesn't matter in practice. llvm-svn: 361331
* [AMDGPU] gfx1010: use fmac instructionsStanislav Mekhanoshin2019-05-041-1/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959
* Revert "AMDGPU: Split block for si_end_cf"Mark Searles2019-04-271-1/+0
| | | | | | | | | | This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363
* [AMDGPU] predicate and feature refactoringStanislav Mekhanoshin2019-04-051-4/+4
| | | | | | | | | We have done some predicate and feature refactoring lately but did not upstream it. This is to sync. Differential revision: https://reviews.llvm.org/D60292 llvm-svn: 357791
* AMDGPU: Split block for si_end_cfMatt Arsenault2019-04-031-0/+1
| | | | | | | | | | | | | | | Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten. Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA. This avoids regressions in the fast register allocator rewrite from inverting the direction. llvm-svn: 357634
* [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.Neil Henning2019-04-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | | This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400
* AMDGPU: Fix missing scc implicit def on s_andn2_b64_termMatt Arsenault2019-03-271-18/+13
| | | | | | | Introduce new helper class to copy properties directly from the base instruction. llvm-svn: 357089
* AMDGPU: wave_barrier is not isBarrierMatt Arsenault2019-03-271-1/+0
| | | | | | | This is not a control flow instruction, so should not be marked as isBarrier. This fixes a verifier error if followed by unreachable. llvm-svn: 357081
OpenPOWER on IntegriCloud