summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Use GCNRPTracker dumper methods in schedulerStanislav Mekhanoshin2017-05-161-18/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D33244 llvm-svn: 303186
* [AMDGPU] Cache live-ins and register pressure in schedulerStanislav Mekhanoshin2017-05-161-71/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using LIS can be quite expensive, so caching of calculated region live-ins and pressure is implemented. It does two things: 1. Caches the info for the second stage when we schedule with decreased target occupancy. 2. Tracks the basic block from top to bottom thus eliminating the need to scan whole register file liveness at every region split in the middle of the block. The scheduling is now done in 3 stages instead of two, with the first one being really a no-op and only used to collect scheduling regions as sent by the scheduler driver. There is no functional change to the current behavior, only compilation speed is affected. In general computeBlockPressure() could be simplified if we switch to backward RP tracker, because scheduler sends regions within a block starting from the last upward. We could use a natural order of upward tracker to seamlessly change between regions of the same block, since live reg set of a previous tracked region would become a live-out of the next region. That however requires fixing upward tracker to properly account defs and uses of the same instruction as both are contributing to the current pressure. When we converge on the produced pressure we should be able to switch between them back and forth. In addition, backward tracker is less expensive as it uses LIS in recede less often than forward uses it in advance. At the moment the worst known case compilation time has improved from 26 minutes to 8.5. Differential Revision: https://reviews.llvm.org/D33117 llvm-svn: 303184
* [AMDGPU] Turn register pressure estimation into forward trackerStanislav Mekhanoshin2017-05-161-112/+39
| | | | | | | | | | This factors register pressure estimation mechanism from the GCNSchedStrategy into the forward tracker to unify interface with other strategies and expose it to other interested phases. Differential Revision: https://reviews.llvm.org/D33105 llvm-svn: 303179
* [AMDGPU] Fix incorrect register pressure calculationStanislav Mekhanoshin2017-05-111-2/+3
| | | | | | | | | Earlier fix D32572 introduced a bug where live-ins were calculated for basic block instead of scheduling region. This change fixes it. Differential Revision: https://reviews.llvm.org/D33086 llvm-svn: 302812
* AMDGPU: Fix assert in schedulerKonstantin Zhuravlyov2017-04-271-1/+2
| | | | | | | | Assert is triggered if DBG_VALUE is first instruction in BB Differential Revision: https://reviews.llvm.org/D32572 llvm-svn: 301511
* [AMDGPU] Fix recorded region boundaries in max-occupancy schedulerStanislav Mekhanoshin2017-03-281-10/+5
| | | | | | | | | | This is incorrect to record region boundaries before scheduling, it may change after scheduling. As a result second pass may see less instructions to schedule than it should. Differential Revision: https://reviews.llvm.org/D31434 llvm-svn: 298945
* [AMDGPU] Iterative scheduling infrastructure + minimal registry schedulerValery Pykhtin2017-03-211-3/+1
| | | | | | Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368
* [AMDGPU] Remove getBidirectionalReasonRankStanislav Mekhanoshin2017-03-111-13/+1
| | | | | | | | | | | | | | This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536
* [AMDGPU] Add second pass of the schedulerStanislav Mekhanoshin2017-02-281-5/+97
| | | | | | | | | | | If during scheduling we have identified that we cannot keep optimistic occupancy increase critical register pressure limit and try scheduling of the whole function again. In this case blocks with smaller pressure will have a chance for better scheduling. Differential Revision: https://reviews.llvm.org/D30442 llvm-svn: 296506
* [AMDGPU] New method to estimate register pressureStanislav Mekhanoshin2017-02-281-21/+135
| | | | | | | | | | | | | | | | | | | | | | | | This change introduces new method to estimate register pressure in GCNScheduler. Standard RPTracker gives huge error due to the following reasons: 1. It does not account for live-ins or live-outs if value is not used in the region itself. That creates a huge error in a very common case if there are a lot of live-thu registers. 2. It does not properly count subregs. 3. It assumes a register used as an input operand can be reused as an output. This is not always possible by itself, this is not what RA will finally do in many cases for various reasons not limited to RA's inability to do so, and this is not so if the value is actually a live-thu. In addition we can now see clear separation between live-in pressure which we cannot change with the scheduling and tentative pressure which we can change. Differential Revision: https://reviews.llvm.org/D30439 llvm-svn: 296491
* [AMDGPU] Fix read-undef flags when schedule is revertedStanislav Mekhanoshin2017-02-281-12/+15
| | | | | | | | | | | | | If two subregs of the same register are defined and we need to revert schedule changing def order, we will end up with both instructions having def,read-undef flags because adjustLaneLiveness() will only set this flag but will not remove it. Fix this by removing read-undef flags before calling adjustLaneLiveness. Differential Revision: https://reviews.llvm.org/D30428 llvm-svn: 296484
* [AMDGPU] Revert failed schedulingStanislav Mekhanoshin2017-02-151-29/+88
| | | | | | | | | | | | | | This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206
* [AMDGPU] Move register related queries to subtarget classKonstantin Zhuravlyov2017-02-081-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D29318 llvm-svn: 294440
* [AMDGPU] Fix GCNSchedStrategy.cpp debug outputStanislav Mekhanoshin2017-02-061-2/+2
| | | | | | | | There is typo in the debug output: top and bottom candidates are switched. Differential Revision: https://reviews.llvm.org/D29608 llvm-svn: 294257
* [AMDGPU] Account workgroup size in LDS occupancy limitsStanislav Mekhanoshin2017-02-011-1/+2
| | | | | | | | | | | | | | | | | | Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
* [AMDGPU] Fix typo in GCNSchedStrategyValery Pykhtin2017-01-261-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D28980 llvm-svn: 293171
* AMDGPU/SI: Allow using SGPRs 96-101 on VIMarek Olsak2016-12-091-1/+1
| | | | | | | | | | | | | | | Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260
* AMDGPU: Whitespace fixesMatt Arsenault2016-11-011-2/+2
| | | | llvm-svn: 285659
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-2/+2
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-291-0/+312
Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
OpenPOWER on IntegriCloud