summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Preserve vcc undef flags when inverting branchMatt Arsenault2016-11-071-3/+16
| | | | | | | | | | | | | If the branch was on a read-undef of vcc, passes that used analyzeBranch to invert the branch condition wouldn't preserve the undef flag resulting in a verifier error. Fixes verifier failures in a future commit. Also fix verifier error when inserting copy for vccz corruption bug. llvm-svn: 286133
* AMDGPU: Refactor copyPhysRegMatt Arsenault2016-11-071-99/+27
| | | | | | Separate the subregister splitting logic to re-use later. llvm-svn: 286118
* AMDGPU: Allow additional implicit operands on MOVRELS instructionsNicolai Haehnle2016-11-021-1/+4
| | | | | | | | | | | | | | | | | | | Summary: The post-RA scheduler occasionally uses additional implicit operands when the vector implicit operand as a whole is killed, but some subregisters are still live because they are directly referenced later. Unfortunately, this seems incredibly subtle to reproduce. Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test and others. Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25656 llvm-svn: 285835
* AMDGPU: Default to using scalar mov to materialize immediateMatt Arsenault2016-11-011-0/+22
| | | | | | | | | | | | This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762
* AMDGPU: Workaround for instruction size with literalsMatt Arsenault2016-11-011-1/+12
| | | | | | | | | | Instructions with a 32-bit base encoding with an optional 32-bit literal encoded after them report their size as 4 for the disassembler. Consider these when computing the MachineInstr size. This fixes problems caused by size estimate consistency in BranchRelaxation. llvm-svn: 285743
* AMDGPU: Use 1/2pi inline imm on VIMatt Arsenault2016-10-291-2/+4
| | | | | | I'm guessing at how it is supposed to be printed llvm-svn: 285490
* AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructionsTom Stellard2016-10-281-0/+14
| | | | | | | | | | | | | | Summary: Flat instruction can return out of order, so we need always need to wait for all the outstanding flat operations. Reviewers: tony-tye, arsenm Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D25998 llvm-svn: 285479
* AMDGPU: Add definitions for scalar store instructionsMatt Arsenault2016-10-281-0/+12
| | | | | | | | | | Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463
* AMDGPU: Fix using incorrect private resource with no allocationMatt Arsenault2016-10-281-2/+9
| | | | | | | | | | | It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435
* AMDGPU: Fix counting si_mask_branch as 4 bytesMatt Arsenault2016-10-261-0/+1
| | | | llvm-svn: 285202
* AMDGPU: Fix Two Address problems with v_movreldNicolai Haehnle2016-10-241-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The v_movreld machine instruction is used with three operands that are in a sense tied to each other (the explicit VGPR_32 def and the implicit VGPR_NN def and use). There is no way to express that using the currently available operand bits, and indeed there are cases where the Two Address instructions pass does the wrong thing. This patch introduces a new set of pseudo instructions that are identical in intended semantics as v_movreld, but they only have two tied operands. Having to add a new set of pseudo instructions is admittedly annoying, but it's a fairly straightforward and solid approach. The only alternative I see is to try to teach the Two Address instructions pass about Three Address instructions, and I'm afraid that's trickier and is going to end up more fragile. Note that v_movrels does not suffer from this problem, and so this patch does not touch it. This fixes several GL45-CTS.shaders.indexing.* tests. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25633 llvm-svn: 284980
* [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external ↵Konstantin Zhuravlyov2016-10-141-3/+8
| | | | | | | | and global address space variables Differential Revision: https://reviews.llvm.org/D25562 llvm-svn: 284196
* AMDGPU: Initial implementation of VGPR indexing modeMatt Arsenault2016-10-121-1/+16
| | | | | | | | | | | This is the most basic handling of the indirect access pseudos using GPR indexing mode. This currently only enables the mode for a single v_mov_b32 and then disables it. This is much more complicated to use than the movrel instructions, so a new optimization pass is probably needed to fold the access into the uses and keep the mode enabled for them. llvm-svn: 284031
* BranchRelaxation: Support expanding unconditional branchesMatt Arsenault2016-10-061-9/+178
| | | | | | | AMDGPU needs to expand unconditional branches in a new block with an indirect branch. llvm-svn: 283464
* AMDGPU: Use unsigned compare for eq/neMatt Arsenault2016-09-301-1/+1
| | | | | | | | | | For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832
* AMDGPU: Partially fix control flow at -O0Matt Arsenault2016-09-291-1/+18
| | | | | | | | | | | | | | | Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. llvm-svn: 282667
* AMDGPU: Use i64 scalar compare instructionsMatt Arsenault2016-09-171-0/+2
| | | | | | VI added eq/ne for i64, so use them. llvm-svn: 281800
* AMDGPU: Use SOPK compare instructionsMatt Arsenault2016-09-161-0/+15
| | | | llvm-svn: 281780
* Finish renaming remaining analyzeBranch functionsMatt Arsenault2016-09-141-2/+2
| | | | llvm-svn: 281535
* Make analyzeBranch family of instruction names consistentMatt Arsenault2016-09-141-1/+1
| | | | | | | analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506
* AArch64: Use TTI branch functions in branch relaxationMatt Arsenault2016-09-141-2/+17
| | | | | | | | | The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505
* AMDGPU: Support commuting a FrameIndex operandMatt Arsenault2016-09-131-9/+16
| | | | llvm-svn: 281369
* AMDGPU: Do not clobber SCC in SIWholeQuadModeNicolai Haehnle2016-09-121-5/+13
| | | | | | | | | | Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22198 llvm-svn: 281230
* AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndexMatt Arsenault2016-09-101-6/+56
| | | | llvm-svn: 281128
* AMDGPU: Fix immediate folding logic when shrinking instructionsMatt Arsenault2016-09-091-7/+7
| | | | | | | | | | If the literal is being folded into src0, it doesn't matter if it's an SGPR because it's being replaced with the literal. Also fixes initially selecting 32-bit versions of some instructions which also confused commuting. llvm-svn: 281117
* AMDGPU] Assembler: better support for immediate literals in assembler.Sam Kolton2016-09-091-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals. E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least. With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction). Here are rules how we convert literals: - We parsed fp literal: - Instruction expects 64-bit operand: - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5) - then we do nothing this literal - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5) - report error - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant - If instruction expect fp operand type (f64) - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5) - If so then do nothing - Else (e.g. v_fract_f64 v[0:1], 3.1415) - report warning that low 32 bits will be set to zeroes and precision will be lost - set low 32 bits of literal to zeroes - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5) - report error as it is unclear how to encode this literal - Instruction expects 32-bit operand: - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5) - do nothing - Else report error - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0) - Parsed binary literal: - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35) - do nothing - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35) - report error - Else, literal is not-inlinable and we are not required to inline it - Are high 32 bit of literal zeroes or same as sign bit (32 bit) - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef) - Else - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0) For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types: ''' enum OperandType { OPERAND_REG_IMM32_INT, OPERAND_REG_IMM32_FP, OPERAND_REG_INLINE_C_INT, OPERAND_REG_INLINE_C_FP, } ''' This is not working yet: - Several tests are failing - Problems with predicate methods for inline immediates - LLVM generated assembler parts try to select e64 encoding before e32. More changes are required for several AsmOperands. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, artem.tamazov Differential Revision: https://reviews.llvm.org/D22922 llvm-svn: 281050
* AMDGPU: Sign extend constants when splitting themMatt Arsenault2016-09-081-3/+2
| | | | | | | This will confuse later passes which try to look at the immediate value and don't truncate first. llvm-svn: 280974
* AMDGPU: Support commuting with immediate in src0Matt Arsenault2016-09-081-97/+71
| | | | llvm-svn: 280970
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-1/+1
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copiesTom Stellard2016-09-061-2/+27
| | | | | | | | | | | | | | | | | | | | Summary: I put this code here, because I want to re-use it in a few other places. This supersedes some of the immediate folding code we have in SIFoldOperands. I think the peephole optimizers is probably a better place for folding immediates into copies, since it does some register coalescing in the same time. This will also make it easier to transition SIFoldOperands into a smarter pass, where it looks at all uses of instruction at once to determine the optimal way to fold operands. Right now, the pass just considers one operand at a time. Reviewers: arsenm Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23402 llvm-svn: 280744
* AMDGPU: Set sizes of spill pseudosMatt Arsenault2016-09-031-3/+1
| | | | llvm-svn: 280595
* AMDGPU: Fix spilling of m0Matt Arsenault2016-09-031-14/+13
| | | | | | | | | readlane/writelane do not support using m0 as the output/input. Constrain the register class of spill vregs to try to avoid this, but also handle spilling of the physreg when necessary by inserting an additional copy to a normal SGPR. llvm-svn: 280584
* AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint()Tom Stellard2016-08-291-0/+11
| | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer will need to use AliasAnalysis here in order to move it before scheduling. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23813 llvm-svn: 279963
* AMDGPU: Move cndmask pseudo to be isel pseudoMatt Arsenault2016-08-271-23/+0
| | | | | | | | There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901
* Replace "fallthrough" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | | This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902
* AMDGPU: Fix not estimating MBB operand sizes correctlyMatt Arsenault2016-08-131-2/+20
| | | | llvm-svn: 278590
* AMDGPU: Remove unnecessary castMatt Arsenault2016-08-101-4/+2
| | | | llvm-svn: 278274
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-6/+6
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* AMDGPU/SI: Don't use reserved VGPRs for SGPR spillingTom Stellard2016-07-281-1/+2
| | | | | | | | | | | | | | | Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980
* AMDGPU: Make AMDGPUMachineFunction fields privateMatt Arsenault2016-07-261-1/+1
| | | | | | | | | ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766
* AMDGPU: Expand register indexing pseudos in custom inserterMatt Arsenault2016-07-191-0/+51
| | | | | | | | | | | | | | | | | | | | | | | This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
* AMDGPU: Fix verifier error from partially undef copyMatt Arsenault2016-07-151-5/+3
| | | | | | | | | | | | | | In this situation: %VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11, %VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use> %VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3 %VGPR4<def> = COPY %VGPR2 The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1, but VGPR4 is defined immediately after this copy. llvm-svn: 275635
* Rename AnalyzeBranch* to analyzeBranch*.Jacques Pienaar2016-07-151-2/+1
| | | | | | | | | | | | Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564
* AMDGPU: Cleanup pseudoinstructionsMatt Arsenault2016-07-121-5/+0
| | | | llvm-svn: 275133
* AMDGPU: Move R600 only pieces into R600 classesMatt Arsenault2016-07-091-8/+0
| | | | llvm-svn: 274979
* AMDGPU: Improve offset folding for register indexingMatt Arsenault2016-07-091-1/+2
| | | | llvm-svn: 274954
* AMDGPU: Simplify isSchedulingBoundaryMatt Arsenault2016-07-091-5/+4
| | | | llvm-svn: 274953
* AMDGPU: Remove implicit iterator conversions, NFCDuncan P. N. Exon Smith2016-07-081-6/+6
| | | | | | | | | | | Remove remaining implicit conversions from MachineInstrBundleIterator to MachineInstr* from the AMDGPU backend. In most cases, I made them less attractive by preferring MachineInstr& or using a ranged-based for loop. Once all the backends are fixed I'll make the operator explicit so that this doesn't bitrot back. llvm-svn: 274906
* AMDGPU: Fix folding SGPRs into madak/madmk src0Matt Arsenault2016-07-051-3/+11
| | | | | | | | | | Because of the special immediate operand, the constant bus is already used so SGPRs are never useful. r263212 changed the name of the immediate operand, which broke the verifier check for the restriction. llvm-svn: 274564
* CodeGen: Use MachineInstr& in TargetInstrInfo, NFCDuncan P. N. Exon Smith2016-06-301-430/+420
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
OpenPOWER on IntegriCloud