summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Move register related queries to subtarget classKonstantin Zhuravlyov2017-02-081-208/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D29318 llvm-svn: 294440
* AMDGPU add support for spilling to a user sgpr pointed buffersTom Stellard2017-01-251-4/+6
| | | | | | | | | | | | | | | | | Summary: This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1]. Patch By: Dave Airlie Reviewers: nhaehnle, arsenm, tstellarAMD Reviewed By: arsenm Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25428 llvm-svn: 293000
* [AMDGPU] Do not allow register coalescer to create big superregsStanislav Mekhanoshin2017-01-181-0/+20
| | | | | | | | | | | | | | | | Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-131-8/+8
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* Extract LaneBitmask into a separate typeKrzysztof Parzyszek2016-12-151-1/+2
| | | | | | | | | | | | Specifically avoid implicit conversions from/to integral types to avoid potential errors when changing the underlying type. For example, a typical initialization of a "full" mask was "LaneMask = ~0u", which would result in a value of 0x00000000FFFFFFFF if the type was extended to uint64_t. Differential Revision: https://reviews.llvm.org/D27454 llvm-svn: 289820
* AMDGPU: Fix handling of 16-bit immediatesMatt Arsenault2016-12-101-13/+0
| | | | | | | | | | | | | | | | | | Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
* AMDGPU/SI: Don't reserve XNACK when it's disabledMarek Olsak2016-12-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This frees 2 additional scalar registers. These are results from all of my 3 patches combined: Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %) Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %) Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27151 llvm-svn: 289262
* AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objectsMarek Olsak2016-12-091-6/+15
| | | | | | | | | | | | Summary: This frees 2 scalar registers. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27150 llvm-svn: 289261
* AMDGPU/SI: Allow using SGPRs 96-101 on VIMarek Olsak2016-12-091-5/+7
| | | | | | | | | | | | | | | Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260
* [AMDGPU] Fix number of reserved SGPRs on CI to reflect flat scratch useStanislav Mekhanoshin2016-12-081-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D27225 llvm-svn: 289095
* AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and ↵Nicolai Haehnle2016-12-081-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | needsFrameBaseReg Summary: Without the fix to isFrameOffsetLegal to consider the instruction's immediate offset, the new test case hits the corresponding assertion in resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a different base register. With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of places because frame base registers are added where they're not needed. This is addressed by properly implementing needsFrameBaseReg, which also helps to avoid unnecessary zero frame indices in a bunch of other places. Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D27344 llvm-svn: 289048
* AMDGPU: remove a couple of unused variablesSaleem Abdulrasool2016-12-031-14/+2
| | | | | | | | | | | | | | | | | lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::spillSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const': lib/Target/AMDGPU/SIRegisterInfo.cpp:572:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable] const TargetRegisterClass *SubRC = nullptr; ^ lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::restoreSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const': lib/Target/AMDGPU/SIRegisterInfo.cpp:723:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable] const TargetRegisterClass *SubRC = nullptr; ^ The variable was assigned to, but never used. The functions called did not mutate state. Simplify the logic and remove the variable. Identified by gcc 5.4.0. llvm-svn: 288601
* AMDGPU: Use wider scalar spills for SGPR spillingMatt Arsenault2016-12-021-15/+70
| | | | | | | | | | | | | | | | Since the spill is for the whole wave, these don't have the swizzling problems that vector stores do and a single 4-byte allocation is enough to spill a 64 element register. This should reduce the number of spill instructions and put all the spills for a register in the same cacheline. This should save allocated private size, but for now it doesn't. The extra slots are allocated for each component, but never used because the frame layout is essentially finalized before frame indices are replaced. For always using the scalar store path, this should probably be moved into processFunctionBeforeFrameFinalized. llvm-svn: 288445
* AMDGPU: Materialize frame index before addMatt Arsenault2016-11-291-1/+6
| | | | | | | | | | | It isn't generally safe to fold the frame index directly into the operand since it will possibly not be an inline immediate after it is expanded. This surprisingly seems to produce better code, since the FI doesn't prevent folding other immediate operands. llvm-svn: 288185
* AMDGPU/SI: Add back reverted SGPR spilling code, but disable itMarek Olsak2016-11-251-75/+200
| | | | | | suggested as a better solution by Matt llvm-svn: 287942
* Revert "AMDGPU: Implement SGPR spilling with scalar stores"Marek Olsak2016-11-251-99/+7
| | | | | | This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e. llvm-svn: 287936
* Revert "AMDGPU: Fix MMO when splitting spill"Marek Olsak2016-11-251-71/+44
| | | | | | This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1. llvm-svn: 287935
* Revert "AMDGPU: Fix adding extra implicit def of register"Marek Olsak2016-11-251-25/+14
| | | | | | This reverts commit e834ce5976567575621901fb967b8018b9916d71. llvm-svn: 287934
* Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling"Marek Olsak2016-11-251-1/+1
| | | | | | This reverts commit 057bbbe4ae170247ba37f08f2e70ef185267d1bb. llvm-svn: 287933
* Revert "AMDGPU: Make m0 unallocatable"Marek Olsak2016-11-251-1/+1
| | | | | | This reverts commit 124ad83dae04514f943902446520c859adee0e96. llvm-svn: 287932
* Revert "AMDGPU: Remove m0 spilling code"Marek Olsak2016-11-251-3/+37
| | | | | | This reverts commit f18de36554eb22416f8ba58e094e0272523a4301. llvm-svn: 287931
* Revert "AMDGPU: Preserve m0 value when spilling"Marek Olsak2016-11-251-34/+5
| | | | | | This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce. llvm-svn: 287930
* AMDGPU: Preserve m0 value when spillingMatt Arsenault2016-11-241-5/+34
| | | | llvm-svn: 287844
* TRI: Add hook to pass scavenger during frame eliminationMatt Arsenault2016-11-241-0/+10
| | | | | | | | | | | | The scavenger was not passed if requiresFrameIndexScavenging was enabled. I need to be able to test for the availability of an unallocatable register here, so I can't create a virtual register for it. It might be better to just always use the scavenger and stop creating virtual registers. llvm-svn: 287843
* AMDGPU: Remove m0 spilling codeMatt Arsenault2016-11-241-37/+3
| | | | | | Since m0 isn't allocatable it should never be spilled anymore. llvm-svn: 287842
* AMDGPU: Make m0 unallocatableMatt Arsenault2016-11-241-1/+1
| | | | | | | | | | | m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it. This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0. llvm-svn: 287841
* AMDGPU: Fix not setting kill flag on temp reg when spillingMatt Arsenault2016-11-231-1/+1
| | | | llvm-svn: 287808
* AMDGPU: Fix adding extra implicit def of registerMatt Arsenault2016-11-231-14/+25
| | | | | | | In the scalar case, there's no reason to add an additional def of the same register. llvm-svn: 287807
* AMDGPU: Fix MMO when splitting spillMatt Arsenault2016-11-231-44/+71
| | | | | | | | | | The size and offset were wrong. The size of the object was being used for the size of the access, when here it is really being split into 4-byte accesses. The underlying object size is set in the MachinePointerInfo, which also didn't have the offset set. llvm-svn: 287806
* Fix spelling mistakes in AMDGPU target comments. NFC.Simon Pilgrim2016-11-181-1/+1
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287333
* AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies passTom Stellard2016-11-161-11/+14
| | | | | | | | | | | | | | | | | | | | | | Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131
* AMDGPU: Implement SGPR spilling with scalar storesMatt Arsenault2016-11-131-7/+99
| | | | | | | | | | | | | | | | nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766
* AMDGPU: Try to fix (non-clang?) bot buildsMatt Arsenault2016-11-071-10/+10
| | | | llvm-svn: 286120
* AMDGPU: Refactor copyPhysRegMatt Arsenault2016-11-071-0/+103
| | | | | | Separate the subregister splitting logic to re-use later. llvm-svn: 286118
* AMDGPU: Stop creating unused virtual registersMatt Arsenault2016-11-011-2/+5
| | | | | | | These are only used in the spill to VMEM path. Move them to the one use. llvm-svn: 285756
* AMDGPU: Fix using incorrect private resource with no allocationMatt Arsenault2016-10-281-1/+12
| | | | | | | | | | | It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435
* Reapply "AMDGPU: Don't use offen if it is 0"Matt Arsenault2016-10-261-9/+95
| | | | | | This reverts r283003 llvm-svn: 285203
* AMDGPU: Fix use-after-freesNicolai Haehnle2016-10-141-1/+1
| | | | | | | | | | Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25312 llvm-svn: 284215
* AMDGPU: Do not re-use tmpreg in spill/restore loweringMatthias Braun2016-10-051-2/+2
| | | | | | | | | The register scavenging code does not support multiple definitions of the same vreg. Differential Revision: https://reviews.llvm.org/D25220 llvm-svn: 283369
* AMDGPU: Factor SGPR spilling into separate functionsMatt Arsenault2016-10-041-129/+160
| | | | llvm-svn: 283175
* AMDGPU: Fix typoMatt Arsenault2016-10-031-1/+1
| | | | llvm-svn: 283108
* Revert "AMDGPU: Don't use offen if it is 0"Mehdi Amini2016-10-011-95/+9
| | | | | | | This reverts commit r282999. Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038 llvm-svn: 283003
* AMDGPU: Don't use offen if it is 0Matt Arsenault2016-10-011-9/+95
| | | | | | This removes many re-initializations of a base register to 0. llvm-svn: 282999
* AMDGPU: Rename spill operands to match real instructionMatt Arsenault2016-09-171-10/+10
| | | | llvm-svn: 281823
* AMDGPU/SI: Add support for triples with the mesa3d operating systemTom Stellard2016-09-161-1/+2
| | | | | | | | | | | | | | Summary: mesa3d will use the same kernel calling convention as amdhsa, but it will handle everything else like the default 'unknown' OS type. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22783 llvm-svn: 281779
* AMDGPU: Remove code I think is deadMatt Arsenault2016-09-131-27/+3
| | | | | | | | As far as I can tell, resolveFrameIndex is supposed to be called with a legal offset, so inserting an add shouldn't be necessary. llvm-svn: 281372
* AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndexMatt Arsenault2016-09-101-2/+2
| | | | llvm-svn: 281128
* AMDGPU] Assembler: better support for immediate literals in assembler.Sam Kolton2016-09-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals. E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least. With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction). Here are rules how we convert literals: - We parsed fp literal: - Instruction expects 64-bit operand: - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5) - then we do nothing this literal - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5) - report error - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant - If instruction expect fp operand type (f64) - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5) - If so then do nothing - Else (e.g. v_fract_f64 v[0:1], 3.1415) - report warning that low 32 bits will be set to zeroes and precision will be lost - set low 32 bits of literal to zeroes - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5) - report error as it is unclear how to encode this literal - Instruction expects 32-bit operand: - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5) - do nothing - Else report error - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0) - Parsed binary literal: - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35) - do nothing - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35) - report error - Else, literal is not-inlinable and we are not required to inline it - Are high 32 bit of literal zeroes or same as sign bit (32 bit) - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef) - Else - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0) For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types: ''' enum OperandType { OPERAND_REG_IMM32_INT, OPERAND_REG_IMM32_FP, OPERAND_REG_INLINE_C_INT, OPERAND_REG_INLINE_C_FP, } ''' This is not working yet: - Several tests are failing - Problems with predicate methods for inline immediates - LLVM generated assembler parts try to select e64 encoding before e32. More changes are required for several AsmOperands. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, artem.tamazov Differential Revision: https://reviews.llvm.org/D22922 llvm-svn: 281050
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-123/+178
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU: Fix spilling of m0Matt Arsenault2016-09-031-2/+26
| | | | | | | | | readlane/writelane do not support using m0 as the output/input. Constrain the register class of spill vregs to try to avoid this, but also handle spilling of the physreg when necessary by inserting an additional copy to a normal SGPR. llvm-svn: 280584
OpenPOWER on IntegriCloud