summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-4/+4
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [AMDGPU][Waitcnt] As of gfx7, VMEM operations do not increment the export ↵Mark Searles2018-04-261-1/+1
| | | | | | | | counter and the input registers are available in the next instruction; update the waitcnt pass to take this into account. Differential Revision: https://reviews.llvm.org/D46067 llvm-svn: 330954
* [AMDGPU] Waitcnt pass: add debug optionsMark Searles2018-04-251-11/+75
| | | | | | | | | | | | | | | | | - Add "amdgpu-waitcnt-forcezero" to force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) - Add debug counters to control force emit of s_waitcnt instrs; debug counters: si-insert-waitcnts-forceexp: force emit s_waitcnt expcnt(0) instrs si-insert-waitcnts-forcevm: force emit s_waitcnt lgkmcnt(0) instrs si-insert-waitcnts-forcelgkm: force emit s_waitcnt vmcnt(0) instrs - Add some debug statements Note that a variant of this patch was previously committed/reverted. Differential Revision: https://reviews.llvm.org/D45888 llvm-svn: 330862
* [AMDGPU][Waitcnt] NFC. Cleanup some code/naming consistency:Mark Searles2018-04-241-38/+38
| | | | | | - s/SWaitcnt/Waitcnt s/WaitCnt/Waitcnt llvm-svn: 330730
* [AMDGPU] Do not only rely on BB number when finding bottom loopMark Searles2018-04-191-20/+45
| | | | | | | | We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing. Differential Revision: https://reviews.llvm.org/D43831 llvm-svn: 330337
* [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case ↵Mark Searles2018-03-141-13/+29
| | | | | | | | of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed. Differential Revision: https://reviews.llvm.org/D44434 llvm-svn: 327583
* [AMDGPU] Make note of existing waitcnt instrs; this is add-on work related ↵Mark Searles2018-02-191-18/+16
| | | | | | to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up. llvm-svn: 325524
* [AMDGPU] Combine adjacent waitcounts in a single strongest waitStanislav Mekhanoshin2018-02-151-7/+44
| | | | | | Differential Revision: https://reviews.llvm.org/D43350 llvm-svn: 325299
* [AMDGPU] Fixed wait count reuseStanislav Mekhanoshin2018-02-081-50/+40
| | | | | | | | | | | | | The code reusing existing wait counts is incorrect since it keeps adding new operands to an old instruction instead of replacing the immediate. It was also effectively switched off by the condition that wait count is not an AMDGPU::S_WAITCNT. Also switched to BuildMI instead of creating instructions directly. Differential Revision: https://reviews.llvm.org/D42997 llvm-svn: 324547
* [AMDGPU] Suppress redundant waitcnt instrs.Mark Searles2018-02-071-18/+38
| | | | | | | | | | | | 1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR. 2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing. 3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production. Differential Revision: https://reviews.llvm.org/D42854 llvm-svn: 324440
* [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr ↵Mark Searles2018-01-301-63/+8
| | | | | | | | | | | | | count in debug output." Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\ teps/build_Lld/logs/stdio : /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable] static int32_t InstCnt = 0; " This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc. llvm-svn: 323791
* [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug ↵Mark Searles2018-01-301-8/+63
| | | | | | | | | | | | | | | output. -amdgpu-waitcnt-forcezero={1|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch). Differential Revision: https://reviews.llvm.org/D40091 llvm-svn: 323788
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-291-1/+1
| | | | | | "to to" -> "to" llvm-svn: 323628
* [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr ↵Mark Searles2017-12-071-62/+8
| | | | | | | | | | | | | | count in debug output." Patch caused a buildbot failure; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/15733/steps/build_Lld/logs/stdio : lib/Target/AMDGPU/SIInsertWaitcnts.cpp:396:11: error: private field 'InstCnt' is not used [-Werror,-Wunused-private-field] int32_t InstCnt = 0; ^ 1 error generated. " This reverts commit 71627f79010aafe74fdcba901bba28dd7caa0869. llvm-svn: 320086
* [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug ↵Mark Searles2017-12-071-8/+62
| | | | | | | | | | | | | output. -amdgpu-waitcnt-forcezero={1|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs Differential Revision: https://reviews.llvm.org/D40091 llvm-svn: 320084
* AMDGPU: fix missing s_waitcntTim Corringham2017-12-041-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The pass that inserts s_waitcnt instructions where needed propagated info used to track dependencies for each block by iterating over the predecessor blocks. The iteration was terminated when a predecessor that had not yet been processed was encountered. Any info in blocks later in the list was therefore not processed, leading to the possiblility of a required s_waitcnt not being inserted. The fix is simply to change the "break" to "continue" for the relevant loops, so that all visited blocks are processed. This is likely what was intended when the code was written. There is no test case provided for this fix because: 1) the only example that reproduces this is large and resistant to being reduced 2) the change is trivial Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D40544 llvm-svn: 319651
* AMDGPU: Move hazard avoidance out of waitcnt pass.Matt Arsenault2017-11-171-54/+0
| | | | | | | | | | This is mostly moving VMEM clause breaking into the hazard recognizer. Also move another hazard currently handled in the waitcnt pass. Also stops breaking clauses unless xnack is enabled. llvm-svn: 318557
* Fix warnings discovered by rL317076. [-Wunused-private-field]NAKAMURA Takumi2017-11-011-2/+0
| | | | llvm-svn: 317091
* [AMDGPU] NFC: test commitEvgeny Mankov2017-08-161-10/+10
| | | | llvm-svn: 311019
* [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-081-53/+68
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 310328
* AMDGPU: Partially fix improper reliance on memoperandsMatt Arsenault2017-07-211-17/+26
| | | | | | | There are 2 more places doing this, but I'm not sure what they are doing and don't make any sense to me llvm-svn: 308770
* AMDGPU: Don't track lgkmcnt for global_/scratch_ instructionsMatt Arsenault2017-07-211-4/+7
| | | | llvm-svn: 308766
* [AMDGPU] Fix uninit'ed var (RevisitLoop)Mark Searles2017-06-051-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D33907 llvm-svn: 304729
* AMDGPU: Make auto waitcnt before barrier a featureKonstantin Zhuravlyov2017-06-021-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D33793 llvm-svn: 304571
* [AMDGPU] Fix bugs in new waitcnt pass. Add test.Mark Searles2017-05-311-4/+22
| | | | | | | | | | | - new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it - fix handling of PERMUTE ops - fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass ) - add new test Differential Revision: https://reviews.llvm.org/D33114 llvm-svn: 304311
* [AMDGPU] In the new waitcnt insertion pass, use getHeader Kannan Narayanan2017-05-051-5/+5
| | | | | | | | instead of getTopBlock to find the loop header. Differential Revision: https://reviews.llvm.org/D32831 llvm-svn: 302290
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-2/+2
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* [AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.Kannan Narayanan2017-04-121-0/+1863
Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023
OpenPOWER on IntegriCloud