summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstructions.td
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Add replacement export intrinsicsMatt Arsenault2017-01-171-0/+9
| | | | llvm-svn: 292205
* AMDGPU: Remove dead patternMatt Arsenault2017-01-171-5/+0
| | | | | | | This is the unsafe conversion pattern, but not guarded by an unsafe math check. It is also already done in LegalizeDAG. llvm-svn: 292173
* [AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64)Konstantin Zhuravlyov2017-01-131-0/+31
| | | | | | Differential Revision: https://reviews.llvm.org/D28496 llvm-svn: 291954
* AMDGPU: Fix sext_inreg for i1 in i16Matt Arsenault2017-01-111-0/+5
| | | | | | | | This produces worse code when i16 is legal, mostly due to combines getting confused by conversions inserted for uniform 16-bit operations. llvm-svn: 291717
* AMDGPU: Implement f16 fcanonicalizeMatt Arsenault2016-12-221-0/+5
| | | | llvm-svn: 290300
* AMDGPU: Select branch on undef to uniform scc branchMatt Arsenault2016-12-151-0/+6
| | | | llvm-svn: 289877
* AMDGPU: Assembler support for vintrp instructionsMatt Arsenault2016-12-151-6/+6
| | | | llvm-svn: 289866
* AMDGPU: Change vintrp printingMatt Arsenault2016-12-141-6/+6
| | | | llvm-svn: 289664
* AMDGPU: Fix handling of 16-bit immediatesMatt Arsenault2016-12-101-3/+2
| | | | | | | | | | | | | | | | | | Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
* AMDGPU: Fix vintrp disassemblyMatt Arsenault2016-12-101-11/+11
| | | | llvm-svn: 289292
* AMDGPU: Change vintrp printing to better match scMatt Arsenault2016-12-101-3/+3
| | | | | | | Some of the immediates need to be printed differently eventually. llvm-svn: 289291
* AMDGPU: Make f16 ConstantFP legalMatt Arsenault2016-12-081-0/+13
| | | | | | | | | | | | | | Not having this legal led to combine failures, resulting in dumb things like bitcasts of constants not being folded away. The only reason I'm leaving the v_mov_b32 hack that f32 already uses is to avoid madak formation test regressions. PeepholeOptimizer has an ordering issue where the immediate fold attempt is into the sgpr->vgpr copy instead of the actual use. Running it twice avoids that problem. llvm-svn: 289096
* AMDGPU: Fix operand name for v_interp_*Matt Arsenault2016-12-061-16/+16
| | | | | | Other VOP instructions call the output vdst llvm-svn: 288856
* AMDGPU: Refactor exp instructionsMatt Arsenault2016-12-051-8/+2
| | | | | | | | | | | | | | | Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695
* AMDGPU/SI: Use float as the operand type for amdgcn.interp intrinsicsTom Stellard2016-11-261-2/+2
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26724 llvm-svn: 287962
* Fix spelling mistakes in AMDGPU target comments. NFC.Simon Pilgrim2016-11-181-2/+2
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287333
* [AMDGPU] Refactor v_mac_{f16, f32} patterns into a class NFCKonstantin Zhuravlyov2016-11-161-23/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D26711 llvm-svn: 287077
* [AMDGPU] Handle f16 select{_cc}Konstantin Zhuravlyov2016-11-161-10/+12
| | | | | | | | | | - Select `select` to `v_cndmask_b32` - Expand `select_cc` - Refactor patterns Differential Revision: https://reviews.llvm.org/D26714 llvm-svn: 287074
* [AMDGPU] Add wave barrier builtinStanislav Mekhanoshin2016-11-151-0/+11
| | | | | | | | | | | The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 llvm-svn: 287007
* AMDGPU: Fix f16 fabs/fnegMatt Arsenault2016-11-151-0/+15
| | | | llvm-svn: 286931
* [AMDGPU] Add f16 support (VI+)Konstantin Zhuravlyov2016-11-131-3/+58
| | | | | | Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
* AMDGPU: Add VI i16 supportTom Stellard2016-11-101-50/+55
| | | | | | | | Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464
* Revert "AMDGPU: Add VI i16 support"Tom Stellard2016-11-041-55/+50
| | | | | | This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995
* AMDGPU: Add VI i16 supportTom Stellard2016-11-031-50/+55
| | | | | | | | Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939
* AMDGPU: Default to using scalar mov to materialize immediateMatt Arsenault2016-11-011-6/+6
| | | | | | | | | | | | This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762
* AMDGPU: Fix counting si_mask_branch as 4 bytesMatt Arsenault2016-10-261-1/+0
| | | | llvm-svn: 285202
* [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external ↵Konstantin Zhuravlyov2016-10-141-2/+3
| | | | | | | | and global address space variables Differential Revision: https://reviews.llvm.org/D25562 llvm-svn: 284196
* AMDGPU: Add instruction definitions for VGPR indexingMatt Arsenault2016-10-121-0/+4
| | | | | | | VI added a second method of indexing into VGPRs besides using v_movrel* llvm-svn: 284027
* AMDGPU: Remove scheduling info from si_mask_branchMatt Arsenault2016-10-061-0/+2
| | | | llvm-svn: 283475
* AMDGPU: Use unsigned compare for eq/neMatt Arsenault2016-09-301-2/+2
| | | | | | | | | | For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832
* AMDGPU: Partially fix control flow at -O0Matt Arsenault2016-09-291-2/+23
| | | | | | | | | | | | | | | Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. llvm-svn: 282667
* [AMDGPU] Refactor VOP1 and VOP2 instruction TD definitionsValery Pykhtin2016-09-231-422/+0
| | | | | | Differential revision: https://reviews.llvm.org/D24738 llvm-svn: 282234
* [AMDGPU] Refactor VOP3 instruction TD definitionsValery Pykhtin2016-09-201-250/+0
| | | | | | Differential revision: https://reviews.llvm.org/D24664 llvm-svn: 281965
* [AMDGPU] Refactor VOPC instruction TD definitionsValery Pykhtin2016-09-191-303/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D24546 llvm-svn: 281903
* AMDGPU: Fix broken FrameIndex handlingMatt Arsenault2016-09-171-0/+5
| | | | | | | | | | | | | | | | | We were trying to avoid using a FrameIndex operand in non-pointer operands in a convoluted way, and would break because of using TargetFrameIndex. The TargetFrameIndex should only be used in the case where it makes sense to fold it as part of the addressing mode, otherwise it requires materialization like a normal constant. This wasn't working reliably and failed in the added testcase, hitting the assert when processing the frame index. The TargetFrameIndex was coming from trying to produce an AssertZext limiting the maximum stack size. I'm not sure this was correct to begin with, because it is apparently possible to have a single workitem dispatch that requires all 4G of private memory. llvm-svn: 281824
* AMDGPU: Rename spill operands to match real instructionMatt Arsenault2016-09-171-3/+3
| | | | llvm-svn: 281823
* AMDGPU: Allow some control flow intrinsics to be CSEdMatt Arsenault2016-09-161-3/+20
| | | | | | | | | | | These clean up some unnecessary or instructions in cases with complex loops. In the original testcase I noticed this, the same or with exec was repeated 5 or 6 times in a row. With this only one is emitted or sometimes a copy. llvm-svn: 281786
* AMDGPU: Improve splitting 64-bit bit ops by constantsMatt Arsenault2016-09-141-2/+2
| | | | | | | | This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later. llvm-svn: 281488
* [AMDGPU] Refactor MUBUF/MTBUF instructionsValery Pykhtin2016-09-101-528/+1
| | | | | | Differential revision: https://reviews.llvm.org/D24295 llvm-svn: 281137
* AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndexMatt Arsenault2016-09-101-7/+7
| | | | llvm-svn: 281128
* AMDGPU: Fix scheduling info for spill pseudosMatt Arsenault2016-09-101-2/+3
| | | | | | | These defaulted to Write32Bit. I don't think this actually matters since these don't exist during scheduling. llvm-svn: 281127
* AMDGPU: Fix immediate folding logic when shrinking instructionsMatt Arsenault2016-09-091-1/+1
| | | | | | | | | | If the literal is being folded into src0, it doesn't matter if it's an SGPR because it's being replaced with the literal. Also fixes initially selecting 32-bit versions of some instructions which also confused commuting. llvm-svn: 281117
* AMDGPU] Assembler: better support for immediate literals in assembler.Sam Kolton2016-09-091-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals. E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least. With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction). Here are rules how we convert literals: - We parsed fp literal: - Instruction expects 64-bit operand: - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5) - then we do nothing this literal - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5) - report error - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant - If instruction expect fp operand type (f64) - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5) - If so then do nothing - Else (e.g. v_fract_f64 v[0:1], 3.1415) - report warning that low 32 bits will be set to zeroes and precision will be lost - set low 32 bits of literal to zeroes - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5) - report error as it is unclear how to encode this literal - Instruction expects 32-bit operand: - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5) - do nothing - Else report error - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0) - Parsed binary literal: - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35) - do nothing - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35) - report error - Else, literal is not-inlinable and we are not required to inline it - Are high 32 bit of literal zeroes or same as sign bit (32 bit) - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef) - Else - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0) For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types: ''' enum OperandType { OPERAND_REG_IMM32_INT, OPERAND_REG_IMM32_FP, OPERAND_REG_INLINE_C_INT, OPERAND_REG_INLINE_C_FP, } ''' This is not working yet: - Several tests are failing - Problems with predicate methods for inline immediates - LLVM generated assembler parts try to select e64 encoding before e32. More changes are required for several AsmOperands. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, artem.tamazov Differential Revision: https://reviews.llvm.org/D22922 llvm-svn: 281050
* [AMDGPU] Refactor FLAT TD instructionsValery Pykhtin2016-09-051-0/+1
| | | | | | Differential revision: https://reviews.llvm.org/D24072 llvm-svn: 280655
* AMDGPU: Set sizes of spill pseudosMatt Arsenault2016-09-031-0/+5
| | | | llvm-svn: 280595
* AMDGPU: Fix an interaction between WQM and polygon stipplingNicolai Haehnle2016-09-031-0/+1
| | | | | | | | | | | | | | | | | | | | | Summary: This fixes a rare bug in polygon stippling with non-monolithic pixel shaders. The underlying problem is as follows: the prolog part contains the polygon stippling sequence, i.e. a kill. The main part then enables WQM based on the _reduced_ exec mask, effectively undoing most of the polygon stippling. Since we cannot know whether polygon stippling will be used, the main part of a non-monolithic shader must always return to exact mode to fix this problem. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23131 llvm-svn: 280589
* AMDGPU: Fix spilling of m0Matt Arsenault2016-09-031-4/+5
| | | | | | | | | readlane/writelane do not support using m0 as the output/input. Constrain the register class of spill vregs to try to avoid this, but also handle spilling of the physreg when necessary by inserting an additional copy to a normal SGPR. llvm-svn: 280584
* AMDGPU/SI: MIMG TD Refactoring.Changpeng Fang2016-09-011-510/+0
| | | | | | | | | | | | | | Summary: Created a new td file MIMGInstructions.td which contains all definitions of MIMG related instructions. Reviewed by: kzhuravl, vpykhtin Differential Revision: http://reviews.llvm.org/D24106 llvm-svn: 280385
* [AMDGPU] Scalar Memory instructions TD refactoringValery Pykhtin2016-09-011-112/+1
| | | | | | Differential revision: https://reviews.llvm.org/D23996 llvm-svn: 280349
* [AMDGPU] Refactor SOP instructions TD files.Valery Pykhtin2016-08-301-501/+2
| | | | | | Differential revision: https://reviews.llvm.org/D23617 llvm-svn: 280101
OpenPOWER on IntegriCloud