summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Factor SGPR spilling into separate functionsMatt Arsenault2016-10-041-0/+6
| | | | llvm-svn: 283175
* Revert "AMDGPU: Don't use offen if it is 0"Mehdi Amini2016-10-011-5/+5
| | | | | | | This reverts commit r282999. Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038 llvm-svn: 283003
* AMDGPU: Don't use offen if it is 0Matt Arsenault2016-10-011-5/+5
| | | | | | This removes many re-initializations of a base register to 0. llvm-svn: 282999
* Remove unnecessary call to getAllocatableRegClassMatt Arsenault2016-09-071-7/+0
| | | | | | | | This reapplies r252565 and r252674, effectively reverting r252956. This allows VS_32/VS_64 to be unallocatable like they should be. llvm-svn: 280783
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-14/+64
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU: fix mismatch tags, NFCSaleem Abdulrasool2016-08-291-1/+1
| | | | llvm-svn: 280006
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
* AMDGPU/SI: Use a better method for determining the largest pressure setsTom Stellard2016-08-261-4/+11
| | | | | | | | | | | | | | | | | | | | | | Summary: There are a few different sgpr pressure sets, but we only care about the one which covers all of the sgprs. We were using hard-coded register pressure set names to determine the reg set id for the biggest sgpr set. However, we were using the wrong name, and this method is pretty fragile, since the reg pressure set names may change. The new method just looks for the pressure set that contains the most reg units and sets that set as our SGPR pressure set. We've also adopted the same technique for determining our VGPR pressure set. Reviewers: arsenm Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23687 llvm-svn: 279867
* AMDGPU: Remove custom getSubRegMatt Arsenault2016-08-111-6/+0
| | | | | | | This was kind of confusing, the subregister class shouldn't really be necessary. llvm-svn: 278362
* AMDGPU/SI: Don't use reserved VGPRs for SGPR spillingTom Stellard2016-07-281-1/+2
| | | | | | | | | | | | | | | Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980
* AMDGPU: Enable trackLivenessAfterRegAllocMatt Arsenault2016-07-111-0/+1
| | | | | | This has caught a number of bugs. llvm-svn: 275131
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-6/+7
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and ↵Changpeng Fang2016-06-161-1/+1
| | | | | | | | | | eliminateFrameIndex Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/21438 llvm-svn: 272958
* AMDGPU/SI: Enable the post-ra schedulerTom Stellard2016-04-301-0/+2
| | | | | | | | | | | | | | Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
* AMDGPU: Enable LocalStackSlotAllocation passMatt Arsenault2016-04-161-0/+21
| | | | | | | | | | | This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508
* AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registersTom Stellard2016-04-131-1/+2
| | | | | | | | | | | | | | | Summary: When we are spilling SGPRs to scratch memory, we usually don't have free SGPRs to do the address calculation, so we need to re-use the ScratchOffset register for the calculation. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18917 llvm-svn: 266244
* AMDGPU: Cache information about register pressure setsTom Stellard2016-03-231-0/+4
| | | | | | | | | | We can statically decide whether or not a register pressure set is for SGPRs or VGPRs, so we don't need to re-compute this information in SIRegisterInfo::getRegPressureSetLimit(). Differential Revision: http://reviews.llvm.org/D14805 llvm-svn: 264126
* AMDGPU: Add SIWholeQuadMode passNicolai Haehnle2016-03-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982
* AMDGPU: R600 code splitting cleanupMatt Arsenault2016-03-111-3/+3
| | | | | | | Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204
* AMDGPU/SI: Enable frame index scavenging during PrologEpilogueInserterTom Stellard2016-03-041-1/+3
| | | | | | | | | | | | | | | | | | | | | Summary: This allows us to use virtual registers when we need extra registers for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex(). Once all the frame indices have been eliminated, the PrologEpilogueInserter does an extra pass over the program to replace all virtual registers with physical ones. This allows us to make more efficient use of our emergency spill slots, so we only need to create one. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17591 llvm-svn: 262728
* AMDGPU: Set flat_scratch from flat_scratch_init regMatt Arsenault2016-02-121-1/+3
| | | | | | | | | | | | | | This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658
* AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRsTom Stellard2016-02-111-0/+4
| | | | | | | | | | | | | | | | Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599
* AMDGPU/SI: Add SI Machine SchedulerNicolai Haehnle2016-01-131-0/+6
| | | | | | | | | | | | | | | | Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609
* AMDGPU: Optimize VOP2 operand legalizationMatt Arsenault2015-12-011-0/+7
| | | | | | | | | | | | | | | | | | Don't use commuteInstruction, and don't commute if doing so will not improve legality. Skip the more complex checks for literal operands and constant bus restrictions, which are not a concern for VOP2 instructions because src1 does not accept SGPRs or constants and few implicitly read vcc. This gets called quite a few times and the attempts at commuting are a significant fraction of the time spent in SIFixSGPRCopies, so it's somewhat worthwhile to optimize. With this patch and others leading up to it, this reduces the compile time of SIFixSGPRCopies on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system. llvm-svn: 254452
* AMDGPU: Rework how private buffer passed for HSAMatt Arsenault2015-11-301-0/+9
| | | | | | | | | | | | | | | | If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331
* AMDGPU: Rename enums to be consistent with HSA code object terminologyMatt Arsenault2015-11-301-10/+12
| | | | llvm-svn: 254330
* AMDGPU: Add llvm.amdgcn.dispatch.ptr intrinsicTom Stellard2015-11-261-0/+1
| | | | | | | | | | | | | | Summary: This returns a pointer to the dispatch packet, which can be used to load information about the kernel dispach. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D14898 llvm-svn: 254116
* Revert "Remove unnecessary call to getAllocatableRegClass"Tom Stellard2015-11-121-0/+7
| | | | | | | | | | | | | This reverts commit r252565. This also includes the revert of the commit mentioned below in order to avoid breaking tests in AMDGPU: Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64" This reverts commit r252674. llvm-svn: 252956
* AMDGPU: Set isAllocatable = 0 on VS_32/VS_64Matt Arsenault2015-11-111-7/+0
| | | | llvm-svn: 252674
* AMDGPU: Hack for VS_32 register pressureMatt Arsenault2015-11-061-0/+7
| | | | | | | | | | | | | For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321
* AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init orderTom Stellard2015-10-011-9/+12
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12451 llvm-svn: 248978
* AMDGPU: Don't handle invalid reg classes in helper functionsMatt Arsenault2015-09-261-6/+0
| | | | | | | No tests hit these and it would be better to have checks like this explicit where they are used. llvm-svn: 248655
* Introduce target hook for optimizing register copiesMatt Arsenault2015-09-241-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478
* AMDGPU: Remove dead codeMatt Arsenault2015-09-191-4/+0
| | | | | | | getCFGStructurizerRegClass is not used for SI, so move it into R600 specific stuff. llvm-svn: 248087
* AMDGPU: Make sure to reserve super registersMatt Arsenault2015-08-261-0/+3
| | | | | | | | I think this could potentially have broken if one of the super registers were allocated that contain v254/v255. llvm-svn: 246051
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+131
| | | | llvm-svn: 239657
* Revert "AMDGPU: Add core backend files for R600/SI codegen v6"Tom Stellard2012-07-161-47/+0
| | | | | | This reverts commit 4ea70107c5e51230e9e60f0bf58a0f74aa4885ea. llvm-svn: 160303
* AMDGPU: Add core backend files for R600/SI codegen v6Tom Stellard2012-07-161-0/+47
llvm-svn: 160270
OpenPOWER on IntegriCloud