summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Figure out private memory regs after loweringMatt Arsenault2017-07-181-0/+4
| | | | | | | | | | | | | | | | | | Introduce pseudo-registers for registers needed for stack access, which are replaced during finalizeLowering. Note these pseudo-registers are currently only used for the used register location, and not for determining their input argument register. This is better because it avoids the need to try to predict whether a call will be emitted from the IR, and also detects stack objects introduced by legalization. Test changes are from the HasStackObjects check being more accurate since stack objects introduced during legalization are now known. llvm-svn: 308325
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-261-6/+5
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* AMDGPU: Fix scratch wave offset relative FI expansionMatt Arsenault2017-06-191-9/+20
| | | | | | | | The offset may not be an inline immediate, so this needs to be materialized into a register. The post-RA run of SIShrinkInstructions is able to fold it later if it can. llvm-svn: 305761
* AMDGPU: Work around build special casing .inc filesMatt Arsenault2017-06-081-1/+2
| | | | | | | It complains because it assumes these were autogenerated files in the source directory. llvm-svn: 305005
* AMDGPU: Use correct register names in inline assemblyMatt Arsenault2017-06-081-0/+59
| | | | | | Fixes using physical registers in inline asm from clang. llvm-svn: 305004
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-8/+35
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: Expand frame indexes to be relative to scratch wave offsetMatt Arsenault2017-05-171-6/+71
| | | | | | | | | | | | In order for an arbitrary callee to access an object in a caller's stack frame, the 32-bit offset used as the private pointer needs to be relative to the kernel's scratch wave offset register. Convert to this by finding the difference from the current stack frame and scaling by the wavefront size. llvm-svn: 303303
* AMDGPU: Use appropriate soffset for spillingMatt Arsenault2017-05-171-13/+13
| | | | | | | This needs to be the frame offset register, and not the global scratch wave offset register. For kernels, these are the same. llvm-svn: 303287
* [AMDGPU] Merge M0 initializationsStanislav Mekhanoshin2017-04-241-0/+3
| | | | | | | | | | Merges equivalent initializations of M0 and hoists them into a common dominator block. Technically the same code can be used with any register, physical or virtual. Differential Revision: https://reviews.llvm.org/D32279 llvm-svn: 301228
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-28/+31
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* Fix typoMatt Arsenault2017-04-181-1/+1
| | | | llvm-svn: 300597
* [AMDGPU] added SIInstrInfo::getAddNoCarry() helperStanislav Mekhanoshin2017-04-141-3/+1
| | | | | | | | Addressed rest of post submit comments from D31993. Differential Revision: https://reviews.llvm.org/D32057 llvm-svn: 300288
* Revert "Correct register pressure calculation in presence of subregs"Stanislav Mekhanoshin2017-02-241-16/+0
| | | | | | | | This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182
* Correct register pressure calculation in presence of subregsStanislav Mekhanoshin2017-02-231-0/+16
| | | | | | | | | | If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 llvm-svn: 296009
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-211-23/+88
| | | | | | | | | | | | | | | | Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-181-0/+6
| | | | llvm-svn: 295554
* [AMDGPU] Override PSet for M0Stanislav Mekhanoshin2017-02-101-0/+8
| | | | | | | | | | | | This change returns empty PSet list for M0 register. Otherwise its PSet as defined by tablegen is SReg_32. This results in incorrect register pressure calculation every time an instruction uses M0. Such uses count as SReg_32 PSet and inadequately increase pressure on SGPRs. Differential Revision: https://reviews.llvm.org/D29798 llvm-svn: 294691
* [AMDGPU] Implement register pressure callbacksStanislav Mekhanoshin2017-02-081-0/+31
| | | | | | | | | | | | | | | | | | | Implement getRegPressureLimit and getRegPressureSetLimit callbacks in SIRegisterInfo. This makes standard converge scheduler to behave almost the same as GCNScheduler, sometime slightly better sometimes a bit worse. In gerenal that is also possible to switch GCNScheduler to use these callbacks instead of getMaxWaves(), which also makes GCNScheduler slightly better on some tests and slightly worse on another. A big win is behavior with converge scheduler. Note, these are used not only by scheduling, but in places like MachineLICM. Differential Revision: https://reviews.llvm.org/D29700 llvm-svn: 294518
* [AMDGPU] Move register related queries to subtarget classKonstantin Zhuravlyov2017-02-081-208/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D29318 llvm-svn: 294440
* AMDGPU add support for spilling to a user sgpr pointed buffersTom Stellard2017-01-251-4/+6
| | | | | | | | | | | | | | | | | Summary: This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1]. Patch By: Dave Airlie Reviewers: nhaehnle, arsenm, tstellarAMD Reviewed By: arsenm Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25428 llvm-svn: 293000
* [AMDGPU] Do not allow register coalescer to create big superregsStanislav Mekhanoshin2017-01-181-0/+20
| | | | | | | | | | | | | | | | Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-131-8/+8
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* Extract LaneBitmask into a separate typeKrzysztof Parzyszek2016-12-151-1/+2
| | | | | | | | | | | | Specifically avoid implicit conversions from/to integral types to avoid potential errors when changing the underlying type. For example, a typical initialization of a "full" mask was "LaneMask = ~0u", which would result in a value of 0x00000000FFFFFFFF if the type was extended to uint64_t. Differential Revision: https://reviews.llvm.org/D27454 llvm-svn: 289820
* AMDGPU: Fix handling of 16-bit immediatesMatt Arsenault2016-12-101-13/+0
| | | | | | | | | | | | | | | | | | Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
* AMDGPU/SI: Don't reserve XNACK when it's disabledMarek Olsak2016-12-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This frees 2 additional scalar registers. These are results from all of my 3 patches combined: Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %) Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %) Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27151 llvm-svn: 289262
* AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objectsMarek Olsak2016-12-091-6/+15
| | | | | | | | | | | | Summary: This frees 2 scalar registers. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27150 llvm-svn: 289261
* AMDGPU/SI: Allow using SGPRs 96-101 on VIMarek Olsak2016-12-091-5/+7
| | | | | | | | | | | | | | | Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260
* [AMDGPU] Fix number of reserved SGPRs on CI to reflect flat scratch useStanislav Mekhanoshin2016-12-081-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D27225 llvm-svn: 289095
* AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and ↵Nicolai Haehnle2016-12-081-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | needsFrameBaseReg Summary: Without the fix to isFrameOffsetLegal to consider the instruction's immediate offset, the new test case hits the corresponding assertion in resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a different base register. With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of places because frame base registers are added where they're not needed. This is addressed by properly implementing needsFrameBaseReg, which also helps to avoid unnecessary zero frame indices in a bunch of other places. Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D27344 llvm-svn: 289048
* AMDGPU: remove a couple of unused variablesSaleem Abdulrasool2016-12-031-14/+2
| | | | | | | | | | | | | | | | | lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::spillSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const': lib/Target/AMDGPU/SIRegisterInfo.cpp:572:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable] const TargetRegisterClass *SubRC = nullptr; ^ lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::restoreSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const': lib/Target/AMDGPU/SIRegisterInfo.cpp:723:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable] const TargetRegisterClass *SubRC = nullptr; ^ The variable was assigned to, but never used. The functions called did not mutate state. Simplify the logic and remove the variable. Identified by gcc 5.4.0. llvm-svn: 288601
* AMDGPU: Use wider scalar spills for SGPR spillingMatt Arsenault2016-12-021-15/+70
| | | | | | | | | | | | | | | | Since the spill is for the whole wave, these don't have the swizzling problems that vector stores do and a single 4-byte allocation is enough to spill a 64 element register. This should reduce the number of spill instructions and put all the spills for a register in the same cacheline. This should save allocated private size, but for now it doesn't. The extra slots are allocated for each component, but never used because the frame layout is essentially finalized before frame indices are replaced. For always using the scalar store path, this should probably be moved into processFunctionBeforeFrameFinalized. llvm-svn: 288445
* AMDGPU: Materialize frame index before addMatt Arsenault2016-11-291-1/+6
| | | | | | | | | | | It isn't generally safe to fold the frame index directly into the operand since it will possibly not be an inline immediate after it is expanded. This surprisingly seems to produce better code, since the FI doesn't prevent folding other immediate operands. llvm-svn: 288185
* AMDGPU/SI: Add back reverted SGPR spilling code, but disable itMarek Olsak2016-11-251-75/+200
| | | | | | suggested as a better solution by Matt llvm-svn: 287942
* Revert "AMDGPU: Implement SGPR spilling with scalar stores"Marek Olsak2016-11-251-99/+7
| | | | | | This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e. llvm-svn: 287936
* Revert "AMDGPU: Fix MMO when splitting spill"Marek Olsak2016-11-251-71/+44
| | | | | | This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1. llvm-svn: 287935
* Revert "AMDGPU: Fix adding extra implicit def of register"Marek Olsak2016-11-251-25/+14
| | | | | | This reverts commit e834ce5976567575621901fb967b8018b9916d71. llvm-svn: 287934
* Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling"Marek Olsak2016-11-251-1/+1
| | | | | | This reverts commit 057bbbe4ae170247ba37f08f2e70ef185267d1bb. llvm-svn: 287933
* Revert "AMDGPU: Make m0 unallocatable"Marek Olsak2016-11-251-1/+1
| | | | | | This reverts commit 124ad83dae04514f943902446520c859adee0e96. llvm-svn: 287932
* Revert "AMDGPU: Remove m0 spilling code"Marek Olsak2016-11-251-3/+37
| | | | | | This reverts commit f18de36554eb22416f8ba58e094e0272523a4301. llvm-svn: 287931
* Revert "AMDGPU: Preserve m0 value when spilling"Marek Olsak2016-11-251-34/+5
| | | | | | This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce. llvm-svn: 287930
* AMDGPU: Preserve m0 value when spillingMatt Arsenault2016-11-241-5/+34
| | | | llvm-svn: 287844
* TRI: Add hook to pass scavenger during frame eliminationMatt Arsenault2016-11-241-0/+10
| | | | | | | | | | | | The scavenger was not passed if requiresFrameIndexScavenging was enabled. I need to be able to test for the availability of an unallocatable register here, so I can't create a virtual register for it. It might be better to just always use the scavenger and stop creating virtual registers. llvm-svn: 287843
* AMDGPU: Remove m0 spilling codeMatt Arsenault2016-11-241-37/+3
| | | | | | Since m0 isn't allocatable it should never be spilled anymore. llvm-svn: 287842
* AMDGPU: Make m0 unallocatableMatt Arsenault2016-11-241-1/+1
| | | | | | | | | | | m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it. This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0. llvm-svn: 287841
* AMDGPU: Fix not setting kill flag on temp reg when spillingMatt Arsenault2016-11-231-1/+1
| | | | llvm-svn: 287808
* AMDGPU: Fix adding extra implicit def of registerMatt Arsenault2016-11-231-14/+25
| | | | | | | In the scalar case, there's no reason to add an additional def of the same register. llvm-svn: 287807
* AMDGPU: Fix MMO when splitting spillMatt Arsenault2016-11-231-44/+71
| | | | | | | | | | The size and offset were wrong. The size of the object was being used for the size of the access, when here it is really being split into 4-byte accesses. The underlying object size is set in the MachinePointerInfo, which also didn't have the offset set. llvm-svn: 287806
* Fix spelling mistakes in AMDGPU target comments. NFC.Simon Pilgrim2016-11-181-1/+1
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287333
* AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies passTom Stellard2016-11-161-11/+14
| | | | | | | | | | | | | | | | | | | | | | Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131
OpenPOWER on IntegriCloud