summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Revert scheduling to reduce spillingStanislav Mekhanoshin2020-01-031-2/+11
| | | | | | | | | | We can revert region schedule if new schedule decreases occupancy. However, if we already have only one wave we would accept any new schedule even if it blows up register pressure. Such schedule may result in quite heavy spilling which can be avoided if we reject this new schedule. Differential Revision: https://reviews.llvm.org/D72181
* [AMDGPU] Add VerifyScheduling support.Jay Foad2019-10-011-0/+18
| | | | | | | | | | | | | | | | Summary: This is cut and pasted from the corresponding GenericScheduler functions. Reviewers: arsenm, atrick, tstellar, vpykhtin Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68264 llvm-svn: 373346
* Add tracing in pickNodeFromQueue.Jay Foad2019-09-251-0/+1
| | | | | | | This matches GenericScheduler::pickNodeFromQueue, from which this function was mostly cut and pasted. llvm-svn: 372829
* AMDGPU: Fix typoMatt Arsenault2019-09-061-4/+4
| | | | llvm-svn: 371249
* AMDGPU: Avoid constructing new std::vector in initCandidateMatt Arsenault2019-09-051-2/+2
| | | | | | | | | | | | Approximately 30% of the time was spent in the std::vector constructor. In one testcase this pushes the scheduler to being the second slowest pass. I'm not sure I understand why these vector are necessary. The default scheduler initCandidate seems to use some pre-existing vectors for the pressure. llvm-svn: 371136
* [AMDGPU] Speed up live-in virtual register set computaion in ↵Valery Pykhtin2019-06-181-2/+26
| | | | | | | | GCNScheduleDAGMILive. Differential revision: https://reviews.llvm.org/D62401 llvm-svn: 363661
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-2/+2
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* [AMDGPU] Small refactoring in the schedulerStanislav Mekhanoshin2018-06-041-18/+3
| | | | | | | | After last changes some code can be simplified. Differential Revision: https://reviews.llvm.org/D47661 llvm-svn: 333934
* [AMDGPU] Track occupancy in MFIStanislav Mekhanoshin2018-05-311-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep track of achieved occupancy in SIMachineFunctionInfo. At the moment we have a lot of duplicated or even missed code to query and maintain occupancy info. Record it in the MFI and query in a single call. Interfaces: - getOccupancy() - returns current recorded achieved occupancy. - getMinAllowedOccupancy() - returns lesser of the achieved occupancy and the lowest occupancy we are ready to tolerate. For example if a kernel is memory bound we are ready to tolerate 4 waves. - limitOccupancy() - record occupancy level if we have to lower it. - increaseOccupancy() - record occupancy if scheduler managed to increase the occupancy. MFI takes care of integrating different checks affecting occupancy, including LDS use and waves-per-eu attribute. Note that scheduler starts with not yet known register pressure, so has to record either limit or increase in occupancy after it is done. Later passes can just query a resulting value. New interface is used in the active scheduler and NFC wrt its work. Changes are also made to experimental schedulers to use it and record an occupancy after they are done. Before the change waves-per-eu was ignored by experimental schedulers and tolerance window for memory bound kernels was not used. Differential Revision: https://reviews.llvm.org/D47509 llvm-svn: 333629
* [AMDGPU] Add perf hints to functionsStanislav Mekhanoshin2018-05-251-1/+11
| | | | | | | | | | | | | | | This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289
* Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-141-40/+37
| | | | | | | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
* [AMDGPU] Fix amdgpu-waves-per-eu accounting in schedulerStanislav Mekhanoshin2018-05-121-2/+5
| | | | | | | | | | We cannot query this attribute from a subtarget given a machine function. At this point attribute itself is already unavailable and can only be obtained through MFI. Differential Revision: https://reviews.llvm.org/D46781 llvm-svn: 332166
* [DebugInfo] Examine all uses of isDebugValue() for debug instructions.Shiva Chen2018-05-091-3/+3
| | | | | | | | | | | | | | | | | | Because we create a new kind of debug instruction, DBG_LABEL, we need to check all passes which use isDebugValue() to check MachineInstr is debug instruction or not. When expelling debug instructions, we should expel both DBG_VALUE and DBG_LABEL. So, I create a new function, isDebugInstr(), in MachineInstr to check whether the MachineInstr is debug instruction or not. This patch has no new test case. I have run regression test and there is no difference in regression test. Differential Revision: https://reviews.llvm.org/D45342 Patch by Hsiangkai Wang. llvm-svn: 331844
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-01-221-1/+1
| | | | | | "the the" -> "the" llvm-svn: 323074
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-2/+2
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* Recommit CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.valueYaxun Liu2017-12-151-8/+11
| | | | | | The regression on ppc64 was not due to this commit. llvm-svn: 320788
* Revert CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.valueYaxun Liu2017-12-141-11/+8
| | | | | | This commit might have caused regression on ppc64. Revert it to verify that. llvm-svn: 320712
* CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.valueYaxun Liu2017-12-131-8/+11
| | | | | | | | | | | | | | | | | Two issues were found about machine inst scheduler when compiling ProRender with -g for amdgcn target: GCNScheduleDAGMILive::schedule tries to update LiveIntervals for DBG_VALUE, which it should not since DBG_VALUE is not mapped in LiveIntervals. when DBG_VALUE is the last instruction of MBB, ScheduleDAGInstrs::buildSchedGraph and ScheduleDAGMILive::scheduleMI does not move RPTracker properly, which causes assertion. This patch fixes that. Differential Revision: https://reviews.llvm.org/D41132 llvm-svn: 320650
* AMDGPU: Fix crash when scheduling DBG_VALUEMatt Arsenault2017-12-051-1/+5
| | | | | | | | | | | | | | This calls handleMove with a DBG_VALUE instruction, which isn't tracked by LiveIntervals. I'm not sure this is the correct place to fix this. The generic scheduler seems to have more deliberate region selection that skips dbg_value. The test is also really hard to reduce. I haven't been able to figure out what exactly causes this particular case to try moving the dbg_value. llvm-svn: 319732
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-3/+2
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* [CodeGen] Rename DEBUG_TYPE to match passnamesEvandro Menezes2017-07-111-1/+1
| | | | | | | | | Rename missing DEBUG_TYPE "machine-scheduler" from backend files, which were absent from https://reviews.llvm.org/rL303921. Differential revision: https://reviews.llvm.org/D35231 llvm-svn: 307719
* [AMDGPU] Use GCNRPTracker dumper methods in schedulerStanislav Mekhanoshin2017-05-161-18/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D33244 llvm-svn: 303186
* [AMDGPU] Cache live-ins and register pressure in schedulerStanislav Mekhanoshin2017-05-161-71/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using LIS can be quite expensive, so caching of calculated region live-ins and pressure is implemented. It does two things: 1. Caches the info for the second stage when we schedule with decreased target occupancy. 2. Tracks the basic block from top to bottom thus eliminating the need to scan whole register file liveness at every region split in the middle of the block. The scheduling is now done in 3 stages instead of two, with the first one being really a no-op and only used to collect scheduling regions as sent by the scheduler driver. There is no functional change to the current behavior, only compilation speed is affected. In general computeBlockPressure() could be simplified if we switch to backward RP tracker, because scheduler sends regions within a block starting from the last upward. We could use a natural order of upward tracker to seamlessly change between regions of the same block, since live reg set of a previous tracked region would become a live-out of the next region. That however requires fixing upward tracker to properly account defs and uses of the same instruction as both are contributing to the current pressure. When we converge on the produced pressure we should be able to switch between them back and forth. In addition, backward tracker is less expensive as it uses LIS in recede less often than forward uses it in advance. At the moment the worst known case compilation time has improved from 26 minutes to 8.5. Differential Revision: https://reviews.llvm.org/D33117 llvm-svn: 303184
* [AMDGPU] Turn register pressure estimation into forward trackerStanislav Mekhanoshin2017-05-161-112/+39
| | | | | | | | | | This factors register pressure estimation mechanism from the GCNSchedStrategy into the forward tracker to unify interface with other strategies and expose it to other interested phases. Differential Revision: https://reviews.llvm.org/D33105 llvm-svn: 303179
* [AMDGPU] Fix incorrect register pressure calculationStanislav Mekhanoshin2017-05-111-2/+3
| | | | | | | | | Earlier fix D32572 introduced a bug where live-ins were calculated for basic block instead of scheduling region. This change fixes it. Differential Revision: https://reviews.llvm.org/D33086 llvm-svn: 302812
* AMDGPU: Fix assert in schedulerKonstantin Zhuravlyov2017-04-271-1/+2
| | | | | | | | Assert is triggered if DBG_VALUE is first instruction in BB Differential Revision: https://reviews.llvm.org/D32572 llvm-svn: 301511
* [AMDGPU] Fix recorded region boundaries in max-occupancy schedulerStanislav Mekhanoshin2017-03-281-10/+5
| | | | | | | | | | This is incorrect to record region boundaries before scheduling, it may change after scheduling. As a result second pass may see less instructions to schedule than it should. Differential Revision: https://reviews.llvm.org/D31434 llvm-svn: 298945
* [AMDGPU] Iterative scheduling infrastructure + minimal registry schedulerValery Pykhtin2017-03-211-3/+1
| | | | | | Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368
* [AMDGPU] Remove getBidirectionalReasonRankStanislav Mekhanoshin2017-03-111-13/+1
| | | | | | | | | | | | | | This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536
* [AMDGPU] Add second pass of the schedulerStanislav Mekhanoshin2017-02-281-5/+97
| | | | | | | | | | | If during scheduling we have identified that we cannot keep optimistic occupancy increase critical register pressure limit and try scheduling of the whole function again. In this case blocks with smaller pressure will have a chance for better scheduling. Differential Revision: https://reviews.llvm.org/D30442 llvm-svn: 296506
* [AMDGPU] New method to estimate register pressureStanislav Mekhanoshin2017-02-281-21/+135
| | | | | | | | | | | | | | | | | | | | | | | | This change introduces new method to estimate register pressure in GCNScheduler. Standard RPTracker gives huge error due to the following reasons: 1. It does not account for live-ins or live-outs if value is not used in the region itself. That creates a huge error in a very common case if there are a lot of live-thu registers. 2. It does not properly count subregs. 3. It assumes a register used as an input operand can be reused as an output. This is not always possible by itself, this is not what RA will finally do in many cases for various reasons not limited to RA's inability to do so, and this is not so if the value is actually a live-thu. In addition we can now see clear separation between live-in pressure which we cannot change with the scheduling and tentative pressure which we can change. Differential Revision: https://reviews.llvm.org/D30439 llvm-svn: 296491
* [AMDGPU] Fix read-undef flags when schedule is revertedStanislav Mekhanoshin2017-02-281-12/+15
| | | | | | | | | | | | | If two subregs of the same register are defined and we need to revert schedule changing def order, we will end up with both instructions having def,read-undef flags because adjustLaneLiveness() will only set this flag but will not remove it. Fix this by removing read-undef flags before calling adjustLaneLiveness. Differential Revision: https://reviews.llvm.org/D30428 llvm-svn: 296484
* [AMDGPU] Revert failed schedulingStanislav Mekhanoshin2017-02-151-29/+88
| | | | | | | | | | | | | | This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206
* [AMDGPU] Move register related queries to subtarget classKonstantin Zhuravlyov2017-02-081-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D29318 llvm-svn: 294440
* [AMDGPU] Fix GCNSchedStrategy.cpp debug outputStanislav Mekhanoshin2017-02-061-2/+2
| | | | | | | | There is typo in the debug output: top and bottom candidates are switched. Differential Revision: https://reviews.llvm.org/D29608 llvm-svn: 294257
* [AMDGPU] Account workgroup size in LDS occupancy limitsStanislav Mekhanoshin2017-02-011-1/+2
| | | | | | | | | | | | | | | | | | Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
* [AMDGPU] Fix typo in GCNSchedStrategyValery Pykhtin2017-01-261-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D28980 llvm-svn: 293171
* AMDGPU/SI: Allow using SGPRs 96-101 on VIMarek Olsak2016-12-091-1/+1
| | | | | | | | | | | | | | | Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260
* AMDGPU: Whitespace fixesMatt Arsenault2016-11-011-2/+2
| | | | llvm-svn: 285659
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-2/+2
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-291-0/+312
Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
OpenPOWER on IntegriCloud