summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Try to fix (non-clang?) bot buildsMatt Arsenault2016-11-071-10/+10
| | | | llvm-svn: 286120
* AMDGPU: Refactor copyPhysRegMatt Arsenault2016-11-071-0/+103
| | | | | | Separate the subregister splitting logic to re-use later. llvm-svn: 286118
* AMDGPU: Stop creating unused virtual registersMatt Arsenault2016-11-011-2/+5
| | | | | | | These are only used in the spill to VMEM path. Move them to the one use. llvm-svn: 285756
* AMDGPU: Fix using incorrect private resource with no allocationMatt Arsenault2016-10-281-1/+12
| | | | | | | | | | | It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435
* Reapply "AMDGPU: Don't use offen if it is 0"Matt Arsenault2016-10-261-9/+95
| | | | | | This reverts r283003 llvm-svn: 285203
* AMDGPU: Fix use-after-freesNicolai Haehnle2016-10-141-1/+1
| | | | | | | | | | Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25312 llvm-svn: 284215
* AMDGPU: Do not re-use tmpreg in spill/restore loweringMatthias Braun2016-10-051-2/+2
| | | | | | | | | The register scavenging code does not support multiple definitions of the same vreg. Differential Revision: https://reviews.llvm.org/D25220 llvm-svn: 283369
* AMDGPU: Factor SGPR spilling into separate functionsMatt Arsenault2016-10-041-129/+160
| | | | llvm-svn: 283175
* AMDGPU: Fix typoMatt Arsenault2016-10-031-1/+1
| | | | llvm-svn: 283108
* Revert "AMDGPU: Don't use offen if it is 0"Mehdi Amini2016-10-011-95/+9
| | | | | | | This reverts commit r282999. Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038 llvm-svn: 283003
* AMDGPU: Don't use offen if it is 0Matt Arsenault2016-10-011-9/+95
| | | | | | This removes many re-initializations of a base register to 0. llvm-svn: 282999
* AMDGPU: Rename spill operands to match real instructionMatt Arsenault2016-09-171-10/+10
| | | | llvm-svn: 281823
* AMDGPU/SI: Add support for triples with the mesa3d operating systemTom Stellard2016-09-161-1/+2
| | | | | | | | | | | | | | Summary: mesa3d will use the same kernel calling convention as amdhsa, but it will handle everything else like the default 'unknown' OS type. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22783 llvm-svn: 281779
* AMDGPU: Remove code I think is deadMatt Arsenault2016-09-131-27/+3
| | | | | | | | As far as I can tell, resolveFrameIndex is supposed to be called with a legal offset, so inserting an add shouldn't be necessary. llvm-svn: 281372
* AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndexMatt Arsenault2016-09-101-2/+2
| | | | llvm-svn: 281128
* AMDGPU] Assembler: better support for immediate literals in assembler.Sam Kolton2016-09-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals. E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least. With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction). Here are rules how we convert literals: - We parsed fp literal: - Instruction expects 64-bit operand: - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5) - then we do nothing this literal - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5) - report error - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant - If instruction expect fp operand type (f64) - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5) - If so then do nothing - Else (e.g. v_fract_f64 v[0:1], 3.1415) - report warning that low 32 bits will be set to zeroes and precision will be lost - set low 32 bits of literal to zeroes - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5) - report error as it is unclear how to encode this literal - Instruction expects 32-bit operand: - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5) - do nothing - Else report error - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0) - Parsed binary literal: - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35) - do nothing - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35) - report error - Else, literal is not-inlinable and we are not required to inline it - Are high 32 bit of literal zeroes or same as sign bit (32 bit) - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef) - Else - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0) For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types: ''' enum OperandType { OPERAND_REG_IMM32_INT, OPERAND_REG_IMM32_FP, OPERAND_REG_INLINE_C_INT, OPERAND_REG_INLINE_C_FP, } ''' This is not working yet: - Several tests are failing - Problems with predicate methods for inline immediates - LLVM generated assembler parts try to select e64 encoding before e32. More changes are required for several AsmOperands. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, artem.tamazov Differential Revision: https://reviews.llvm.org/D22922 llvm-svn: 281050
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-123/+178
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU: Fix spilling of m0Matt Arsenault2016-09-031-2/+26
| | | | | | | | | readlane/writelane do not support using m0 as the output/input. Constrain the register class of spill vregs to try to avoid this, but also handle spilling of the physreg when necessary by inserting an additional copy to a normal SGPR. llvm-svn: 280584
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-291-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
* XXXTom Stellard2016-08-261-1/+1
| | | | llvm-svn: 279868
* AMDGPU/SI: Use a better method for determining the largest pressure setsTom Stellard2016-08-261-9/+28
| | | | | | | | | | | | | | | | | | | | | | Summary: There are a few different sgpr pressure sets, but we only care about the one which covers all of the sgprs. We were using hard-coded register pressure set names to determine the reg set id for the biggest sgpr set. However, we were using the wrong name, and this method is pretty fragile, since the reg pressure set names may change. The new method just looks for the pressure set that contains the most reg units and sets that set as our SGPR pressure set. We've also adopted the same technique for determining our VGPR pressure set. Reviewers: arsenm Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23687 llvm-svn: 279867
* AMDGPU: Remove custom getSubRegMatt Arsenault2016-08-111-72/+10
| | | | | | | This was kind of confusing, the subregister class shouldn't really be necessary. llvm-svn: 278362
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-10/+10
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* AMDGPU/SI: Don't use reserved VGPRs for SGPR spillingTom Stellard2016-07-281-3/+6
| | | | | | | | | | | | | | | Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980
* AMDGPU: Add HSA dispatch id intrinsicMatt Arsenault2016-07-221-1/+2
| | | | llvm-svn: 276437
* AMDGPU/SI: Emit the number of SGPR and VGPR spillsMarek Olsak2016-07-131-0/+2
| | | | | | | | | | | | | | | | | | | | | Summary: v2: don't count SGPRs spilled to scratch twice I think this is sufficient. It doesn't count private memory usage, which happens often and uses scratch but isn't technically a spill. The private memory usage can be computed by: [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills]. The fact SGPR spills add very high numbers to the scratch size make that computation a guessing game, but I don't have a solution to that. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D22197 llvm-svn: 275288
* AMDGPU: Enable trackLivenessAfterRegAllocMatt Arsenault2016-07-111-0/+5
| | | | | | This has caught a number of bugs. llvm-svn: 275131
* AMDGPU: fix local stack slot allocation bugsNicolai Haehnle2016-07-111-2/+8
| | | | | | | | | | | | | | | | | | | | | | | Summary: The main bug fix here is using the 32-bit encoding of V_ADD_I32 in materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary immediates work. The second part is that we may now require the SegmentWaveByteOffset even when there are initially no stack objects and VGPR spilling isn't enabled, for stack slots that are allocated later. This means that some bits become effectively dead and can be cleaned up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602 Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21551 llvm-svn: 275108
* CodeGen: Use MachineInstr& in TargetInstrInfo, NFCDuncan P. N. Exon Smith2016-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-26/+22
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and ↵Changpeng Fang2016-06-161-12/+26
| | | | | | | | | | eliminateFrameIndex Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/21438 llvm-svn: 272958
* AMDGPU: Remove incorrect assertionMatt Arsenault2016-06-091-4/+0
| | | | | | | I'm still not sure under what circumstances the offset here is non-0, but private memory is not limited to 27-bits. llvm-svn: 272337
* [AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegsKonstantin Zhuravlyov2016-05-241-3/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D20081 llvm-svn: 270594
* AMDGPU: Fix verifier error when spilling undef subregMatt Arsenault2016-05-181-3/+11
| | | | llvm-svn: 270002
* AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratchTom Stellard2016-05-021-2/+1
| | | | | | | | | We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295
* AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratchTom Stellard2016-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | Summary: When we restore an SGPR value from scratch, we first load it into a temporary VGPR and then use v_readlane_b32 to copy the value from the VGPR back into an SGPR. We weren't setting the kill flag on the VGPR in the v_readlane_b32 instruction, so the register scavenger wasn't able to re-use this temp value later. I wasn't able to create a lit test for this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19744 llvm-svn: 268287
* AMDGPU/SI: Enable the post-ra schedulerTom Stellard2016-04-301-16/+11
| | | | | | | | | | | | | | Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
* [AMDGPU] Move reserved vgpr count for trap handler usage to ↵Konstantin Zhuravlyov2016-04-261-2/+3
| | | | | | | | SIMachineFunctionInfo + minor commenting changes Differential Revision: http://reviews.llvm.org/D19537 llvm-svn: 267573
* [AMDGPU] Reserve VGPRs for trap handler usage if instructedKonstantin Zhuravlyov2016-04-261-0/+11
| | | | | | Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563
* AMDGPU: Add queue ptr intrinsicMatt Arsenault2016-04-251-1/+2
| | | | llvm-svn: 267451
* Silence some "initialized but unused" warnings from MSVC -- the function ↵Aaron Ballman2016-04-181-13/+2
| | | | | | being called is a static function, so there's no need for an instance variable. NFC. llvm-svn: 266616
* AMDGPU: Enable LocalStackSlotAllocation passMatt Arsenault2016-04-161-0/+138
| | | | | | | | | | | This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508
* AMDGPU: allow specifying a workgroup size that needs to fit in a compute unitTom Stellard2016-04-141-52/+73
| | | | | | | | | | | | | | | | | | | Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337
* AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registersTom Stellard2016-04-131-10/+27
| | | | | | | | | | | | | | | Summary: When we are spilling SGPRs to scratch memory, we usually don't have free SGPRs to do the address calculation, so we need to re-use the ScratchOffset register for the calculation. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18917 llvm-svn: 266244
* [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and ↵Artem Tamazov2016-04-131-1/+25
| | | | | | | | | | | | | | | TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205
* AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStatesTom Stellard2016-04-071-2/+2
| | | | | | | | | | | | Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678
* AMDGPU: Cache information about register pressure setsTom Stellard2016-03-231-24/+33
| | | | | | | | | | We can statically decide whether or not a register pressure set is for SGPRs or VGPRs, so we don't need to re-compute this information in SIRegisterInfo::getRegPressureSetLimit(). Differential Revision: http://reviews.llvm.org/D14805 llvm-svn: 264126
* AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsicsNicolai Haehnle2016-03-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 llvm-svn: 263140
* AMDGPU/SI: Add support for spiling SGPRs to scratch bufferTom Stellard2016-03-041-17/+69
| | | | | | | | | | | | | | Summary: This is necessary for when we run out of VGPRs and can no longer use v_{read,write}_lane for spilling SGPRs. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17592 llvm-svn: 262732
* AMDGPU/SI: Enable frame index scavenging during PrologEpilogueInserterTom Stellard2016-03-041-7/+13
| | | | | | | | | | | | | | | | | | | | | Summary: This allows us to use virtual registers when we need extra registers for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex(). Once all the frame indices have been eliminated, the PrologEpilogueInserter does an extra pass over the program to replace all virtual registers with physical ones. This allows us to make more efficient use of our emergency spill slots, so we only need to create one. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17591 llvm-svn: 262728
OpenPOWER on IntegriCloud