summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-131-4/+4
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* AMDGPU: Fix shrinking of addc/subb.Matt Arsenault2017-01-111-7/+25
| | | | | | To shrink to VOP2 the input carry must also be VCC. llvm-svn: 291720
* AMDGPU: Fix breaking VOP3 v_add_i32sMatt Arsenault2017-01-111-1/+11
| | | | | | | This was shrinking the instruction even though the carry output register was a virtual register, not known VCC. llvm-svn: 291716
* AMDGPU: Fix handling of 16-bit immediatesMatt Arsenault2016-12-101-8/+11
| | | | | | | | | | | | | | | | | | Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
* [AMDGPU] Add f16 support (VI+)Konstantin Zhuravlyov2016-11-131-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
* AMDGPU: Use brev for materializing SGPR constantsMatt Arsenault2016-11-011-11/+29
| | | | | | This is already done with VGPR immediates and saves 4 bytes. llvm-svn: 285765
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-3/+1
| | | | llvm-svn: 283004
* AMDGPU: Use unsigned compare for eq/neMatt Arsenault2016-09-301-5/+5
| | | | | | | | | | For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832
* AMDGPU: Use SOPK compare instructionsMatt Arsenault2016-09-161-0/+64
| | | | llvm-svn: 281780
* Revert "AMDGPU: Use SOPK compare instructions"Matt Arsenault2016-09-141-62/+0
| | | | | | Accidentally committed llvm-svn: 281514
* AMDGPU: Use SOPK compare instructionsMatt Arsenault2016-09-141-0/+62
| | | | llvm-svn: 281513
* AMDGPU: Fix immediate folding logic when shrinking instructionsMatt Arsenault2016-09-091-8/+2
| | | | | | | | | | If the literal is being folded into src0, it doesn't matter if it's an SGPR because it's being replaced with the literal. Also fixes initially selecting 32-bit versions of some instructions which also confused commuting. llvm-svn: 281117
* AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32Matt Arsenault2016-09-081-9/+14
| | | | llvm-svn: 280972
* AMDGPU: Fix adding duplicate implicit exec usesMatt Arsenault2016-09-031-1/+15
| | | | | | | | I'm not sure if this should be considered a bug in copyImplicitOps or not, but implicit operands that are part of the static instruction definition should not be copied. llvm-svn: 280594
* AMDGPU/SI: Improve register allocation hints for sopk instructionsTom Stellard2016-08-291-0/+1
| | | | | | | | | | | | | | | | | | | Summary: For shrinking SOPK instructions, we were creating a hint to tell the register allocator to use the register allocated for src0 for the dst operand as well. However, this seems to not work sometimes depending on the order virtual registers are assigned physical registers. To fix this, I've added a second allocation hint which does the reverse, asks that the register allocated for dst is used for src0. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23862 llvm-svn: 279968
* AMDGPU: Expand register indexing pseudos in custom inserterMatt Arsenault2016-07-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
* CodeGen: Use MachineInstr& in TargetInstrInfo, NFCDuncan P. N. Exon Smith2016-06-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-2/+3
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32Matt Arsenault2016-06-201-16/+13
| | | | | | | | | The implicit operand is added by the initial instruction construction, so this was adding an additional vcc use. The original one was missing the undef flag the original condition had, so the verifier would complain. llvm-svn: 273182
* AMDGPU: Properly initialize SIShrinkInstructionsMatt Arsenault2016-06-091-8/+2
| | | | llvm-svn: 272336
* Add optimization bisect opt-in calls for AMDGPU passesAndrew Kaylor2016-04-251-0/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D19450 llvm-svn: 267485
* AMDGPU/SI: Optimize adjacent s_nop instructionsMatt Arsenault2016-04-251-0/+27
| | | | | | | | | | | | Use the operand for how long to wait. This is somewhat distasteful, since it would be better to just emit s_nop with the right argument in the first place. This would require changing TII::insertNoop to emit N operands, which would be easy. Slightly more problematic is the post-RA scheduler and hazard recognizer represent nops as a single null node, and would require inventing another way of representing N nops. llvm-svn: 267456
* AMDGPU: Use s_addk_i32 / s_mulk_i32Matt Arsenault2016-04-161-12/+45
| | | | llvm-svn: 266506
* AMDGPU: Materialize sign bits with bfrevMatt Arsenault2016-03-111-0/+24
| | | | | | | If a constant is the same as the reverse of an inline immediate, this is 4 bytes smaller than having to embed a 32-bit literal. llvm-svn: 263201
* AMDGPU: Simplify boolean conditional return statementsMatt Arsenault2016-03-021-4/+1
| | | | | | Patch by Richard Thomson llvm-svn: 262536
* [AMDGPU] Rename $dst operand to $vdst for VOP instructions.Tom Stellard2016-02-161-2/+2
| | | | | | | | | | | | | | Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler. Reviewers: vpykhtin, tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, nhaustov Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D16920 llvm-svn: 260986
* AMDGPU: Add MachineInstr overloads for instruction format testsMatt Arsenault2015-10-201-2/+1
| | | | llvm-svn: 250797
* AMDGPU: Simplify debug printingMatt Arsenault2015-09-101-1/+1
| | | | llvm-svn: 247345
* AMDGPU/SI: Remove VCCRegMatt Arsenault2015-08-081-2/+11
| | | | llvm-svn: 244380
* AMDGPU/SI: Remove source uses of VCCRegMatt Arsenault2015-08-081-11/+32
| | | | llvm-svn: 244379
* AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructionsTom Stellard2015-07-141-6/+27
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11061 llvm-svn: 242146
* AMDGPU/SI: Select mad patterns to v_mac_f32Tom Stellard2015-07-131-2/+14
| | | | | | | | | The two-address instruction pass will convert these back to v_mad_f32 if necessary. Differential Revision: http://reviews.llvm.org/D11060 llvm-svn: 242038
* AMDGPU/SI: The SIShrinkInstructions pass should only fold immediates with ↵Tom Stellard2015-07-091-1/+1
| | | | | | | | | one use This is convered by existing testcases and will be exposed by a future commit. llvm-svn: 241817
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+272
llvm-svn: 239657
OpenPOWER on IntegriCloud