summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstructions.td
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Remove unneeded implicit exec uses/defsMatt Arsenault2016-08-271-40/+29
| | | | | | | SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec. SI_IF_BREAK and SI_ELSE_BREAK do not read it either. llvm-svn: 279909
* AMDGPU: Select mulhi 24-bit instructionsMatt Arsenault2016-08-271-2/+2
| | | | llvm-svn: 279902
* AMDGPU: Move cndmask pseudo to be isel pseudoMatt Arsenault2016-08-271-0/+1
| | | | | | | | There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901
* AMDGPU: Fix sched type for branchesMatt Arsenault2016-08-271-1/+1
| | | | llvm-svn: 279900
* AMDGPU: Remove register operand from si_mask_branchMatt Arsenault2016-08-271-1/+1
| | | | | | | | | It isn't used for anything, and is also misleading since it could be spilled at the end of the block, so it can't be relied on. There ends up being a verifier error about using an undefined register since the spill kills the register. llvm-svn: 279899
* AMDGCN/SI: Implement readlane/readfirstlane intrinsicsChangpeng Fang2016-08-241-4/+5
| | | | | | | | | | | | | | | Summary: This patch implements readlane/readfirstlane intrinsics. TODO: need to define a new register class to consider the case that the source could be a vector register or M0. Reviewed by: arsenm and tstellarAMD Differential Revision: http://reviews.llvm.org/D22489 llvm-svn: 279660
* AMDGPU : Add V_SAD_U32 instruction pattern.Wei Ding2016-08-241-0/+19
| | | | | | Differential Revision: http://reviews.llvm.org/D23069 llvm-svn: 279629
* AMDGPU: Split SILowerControlFlow into two piecesMatt Arsenault2016-08-221-0/+1
| | | | | | | | | | | | | | Do most of the lowering in a pre-RA pass. Keep the skip jump insertion late, plus a few other things that require more work to move out. One concern I have is now there may be COPY instructions which do not have the necessary implicit exec uses if they will be lowered to v_mov_b32. This has a positive effect on SGPR usage in shader-db. llvm-svn: 279464
* [SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> froundMichael Kuperstein2016-08-181-2/+2
| | | | | | | | | | The names of the tablegen defs now match the names of the ISD nodes. This makes the world a slightly saner place, as previously "fround" matched ISD::FP_ROUND and not ISD::FROUND. Differential Revision: https://reviews.llvm.org/D23597 llvm-svn: 279129
* AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.Wei Ding2016-08-181-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D23689 llvm-svn: 279126
* [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.Valery Pykhtin2016-08-181-2/+12
| | | | | | Differential revision: https://reviews.llvm.org/D23666 llvm-svn: 279106
* AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32Wei Ding2016-08-111-0/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D23336 llvm-svn: 278403
* AMDGPU : Add LLVM intrinsics for SAD related instructions.Wei Ding2016-08-111-5/+15
| | | | | | Differential Revision: http://reviews.llvm.org/D23133 llvm-svn: 278354
* AMDGPU/SI: Implement amdgcn image intrinsics with samplerChangpeng Fang2016-08-101-3/+133
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch define and implement amdgcn image intrinsics with sampler. 1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name to have three suffixes to overload each of these three types; 2. D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as resource type, then r128 is 0! 3. don't expose TFE flag, and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da. Differential Revision: http://reviews.llvm.org/D22838 Reviewed by: arsenm and tstellarAMD llvm-svn: 278291
* AMDGPU: s_setpc_b64 should be an indirect branchMatt Arsenault2016-08-101-1/+2
| | | | llvm-svn: 278278
* AMDGPU: Set sizes on control flow pseudosMatt Arsenault2016-08-101-9/+17
| | | | llvm-svn: 278276
* AMDGPU: Change insertion point of si_mask_branchMatt Arsenault2016-08-101-2/+2
| | | | | | | | | | | | | Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273
* AMDGPU: Stay in WQM for non-intrinsic storesNicolai Haehnle2016-08-021-4/+4
| | | | | | | | | | | | | | | | | | | | | | | Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504
* [AMDGPU] refactor DS instruction definitions. NFC.Valery Pykhtin2016-08-011-263/+0
| | | | | | Differential revision: https://reviews.llvm.org/D22522 llvm-svn: 277344
* AMDGPU: Set s_setpc_b64 as a terminatorMatt Arsenault2016-07-301-0/+3
| | | | llvm-svn: 277259
* AMDGPU : Add intrinsics for compare with the full wavefront resultWei Ding2016-07-281-0/+65
| | | | | | Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
* AMDGPU: add execfix flag to SI_ELSENicolai Haehnle2016-07-281-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972
* AMDGPU: Use implicit_def for selecting anyextMatt Arsenault2016-07-261-4/+7
| | | | llvm-svn: 276819
* AMDGPU: Add fp legacy instruction intrinsicsMatt Arsenault2016-07-261-2/+3
| | | | | | | This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
* AMDGPU: Delete more dead codeMatt Arsenault2016-07-221-7/+0
| | | | | | | Remove dead code from r600 intrinsic removal. Remove unset members, rename StackSize to be less ambiguous. llvm-svn: 276436
* AMDGPU: Fix i1 fp_to_intMatt Arsenault2016-07-221-0/+10
| | | | | | | R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435
* AMDGPU: Only use legal inline immediates with kill pseudoMatt Arsenault2016-07-191-1/+1
| | | | | | | | | | | Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
* AMDGPU: Expand register indexing pseudos in custom inserterMatt Arsenault2016-07-191-8/+18
| | | | | | | | | | | | | | | | | | | | | | | This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
* AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32Matt Arsenault2016-07-181-1/+1
| | | | llvm-svn: 275871
* AMDGPU/R600: Replace barrier intrinsicsMatt Arsenault2016-07-181-11/+0
| | | | llvm-svn: 275870
* AMDGPU: Follow up to r275203Matt Arsenault2016-07-121-2/+10
| | | | | | I meant to squash this into it. llvm-svn: 275220
* AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8Wei Ding2016-07-121-0/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D22239 llvm-svn: 275197
* AMDGPU: Unify MOVRELSOffset and MOVRELDOffsetNicolai Haehnle2016-07-121-2/+2
| | | | | | | | | | | | | | | | Summary: Previously, constant index insertelements would be turned into SI_INDIRECT_DST, which is bound to prevent some optimization opportunities. Worse, it mislead the heuristic that decides whether immediates should be lowered to S_MOV_B32 or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22217 llvm-svn: 275160
* AMDGPU: Cleanup pseudoinstructionsMatt Arsenault2016-07-121-51/+45
| | | | llvm-svn: 275133
* AMDGPU: Fix missing scc def on control flow pseudosMatt Arsenault2016-07-121-2/+2
| | | | | | These are all expanded to instructions that include an scc def. llvm-svn: 275132
* Revert "AMDGPU: Remove unused control flow intrinsic"Matt Arsenault2016-07-091-0/+5
| | | | llvm-svn: 274978
* AMDGPU: Improve offset folding for register indexingMatt Arsenault2016-07-091-22/+8
| | | | llvm-svn: 274954
* AMDGPU: Remove unused control flow intrinsicMatt Arsenault2016-07-081-5/+0
| | | | llvm-svn: 274939
* [AMDGPU] fix ds_swizzle_b32 opcode for VI (bz 28371)Valery Pykhtin2016-07-081-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D22049 llvm-svn: 274852
* AMDGPU: Move si_mask_branch register operand to be a useMatt Arsenault2016-07-081-1/+1
| | | | llvm-svn: 274818
* [AMDGPU] fix ds_write_src2 encoding (bz26027)Valery Pykhtin2016-07-071-2/+2
| | | | | | Differential revision: http://reviews.llvm.org/D22041 llvm-svn: 274756
* [AMDGPU] rename DS_1A1D_Off8_NORET to DS_1A2D_Off8_NORET as ds_write2xx use ↵Valery Pykhtin2016-07-051-4/+4
| | | | | | 2 source registers. NFC. llvm-svn: 274556
* AMDGPU/SI: Remove hack for selecting < 32-bit loads to MUBUF instructionsTom Stellard2016-07-041-4/+4
| | | | | | | | | | | | | | | Summary: The isGlobalLoad() query was returning true for constant address space loads with memory types less than 32-bits, which is wrong. This logic has been replaced with PatFrag in the TableGen files, to provide the same functionality. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21696 llvm-svn: 274521
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-2/+2
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU: readlane/writelane do not read execMatt Arsenault2016-06-231-1/+2
| | | | llvm-svn: 273525
* AMDGPU: Fix liveness when expanding m0 loopMatt Arsenault2016-06-221-6/+7
| | | | llvm-svn: 273514
* AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32Changpeng Fang2016-06-221-0/+12
| | | | | | | | Reviewers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D21533 llvm-svn: 273496
* AMDGPU: Fix verifier errors in SILowerControlFlowMatt Arsenault2016-06-221-53/+46
| | | | | | | | | | | | | The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467
* AMDGPU: Temporarily select trap to s_endpgmMatt Arsenault2016-06-171-0/+1
| | | | | | | | | | | | This should select to s_trap, but that requires additonal work to setup and enable the trap handler. For now emit s_endpgm so bugpoint stops getting stuck on the unsupported call to abort. Emit a warning that this will only terminate the wave and not really trap. llvm-svn: 273062
* AMDGPU: Remove llvm.SI.tid intrinsicMatt Arsenault2016-06-171-6/+0
| | | | | | Mesa doesn't emit this for llvm >= 3.8 anymore. llvm-svn: 273050
OpenPOWER on IntegriCloud