summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Make auto waitcnt before barrier a featureKonstantin Zhuravlyov2017-06-021-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D33793 llvm-svn: 304571
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-5/+5
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* AMDGPU: Insert wait at start of callee functionsMatt Arsenault2017-04-111-0/+14
| | | | llvm-svn: 300000
* AMDGPU: Rename SI_RETURNMatt Arsenault2017-03-211-2/+2
| | | | | | | | This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
* AMDGPU: Don't wait at end of block with a trivial successorMatt Arsenault2017-03-081-2/+14
| | | | | | | | | | If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. llvm-svn: 297251
* [AMDGPU] Add target information that is required by tools to metadataKonstantin Zhuravlyov2017-02-081-13/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D28760#fb670e28 llvm-svn: 294449
* [AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-01-211-21/+24
| | | | | | other minor fixes (NFC). llvm-svn: 292688
* AMDGPU/SI: Implement sendmsghalt intrinsicJan Vesely2017-01-041-2/+3
| | | | | | | | v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
* AMDGPU: Refactor exp instructionsMatt Arsenault2016-12-051-5/+5
| | | | | | | | | | | | | | | Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695
* AMDGPU: Rename flat operands to match mubufMatt Arsenault2016-11-291-6/+6
| | | | | | | | | | Use vaddr/vdst for the same purposes. This also fixes a beg in SIInsertWaits for the operand check. The stored value operand is currently called data0 in the single offset case, not data. llvm-svn: 288188
* AMDGPU/SI: Add back reverted SGPR spilling code, but disable itMarek Olsak2016-11-251-1/+42
| | | | | | suggested as a better solution by Matt llvm-svn: 287942
* Revert "AMDGPU: Implement SGPR spilling with scalar stores"Marek Olsak2016-11-251-42/+1
| | | | | | This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e. llvm-svn: 287936
* AMDGPU: Implement SGPR spilling with scalar storesMatt Arsenault2016-11-131-1/+42
| | | | | | | | | | | | | | | | nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766
* AMDGPU: Preserve vcc undef flags when inverting branchMatt Arsenault2016-11-071-4/+6
| | | | | | | | | | | | | If the branch was on a read-undef of vcc, passes that used analyzeBranch to invert the branch condition wouldn't preserve the undef flag resulting in a verifier error. Fixes verifier failures in a future commit. Also fix verifier error when inserting copy for vccz corruption bug. llvm-svn: 286133
* AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructionsTom Stellard2016-10-281-2/+11
| | | | | | | | | | | | | | Summary: Flat instruction can return out of order, so we need always need to wait for all the outstanding flat operations. Reviewers: tony-tye, arsenm Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D25998 llvm-svn: 285479
* [AMDGPU] Refactor waitcnt encodingKonstantin Zhuravlyov2016-10-111-12/+16
| | | | | | | | | | | | | - Refactor bit packing/unpacking - Calculate bit mask given bit shift and bit width - Introduce function for decoding bits of waitcnt - Introduce function for encoding bits of waitcnt - Introduce function for getting waitcnt mask (instead of using bare numbers) - Introduce function fot getting max waitcnt(s) (instead of using bare numbers) Differential Revision: https://reviews.llvm.org/D25298 llvm-svn: 283919
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-1/+1
| | | | llvm-svn: 283004
* [AMDGPU] Choose VMCNT, EXPCNT, LGKMCNT masks and shifts based on the isa versionKonstantin Zhuravlyov2016-09-301-6/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D24973 llvm-svn: 282877
* [AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier ↵Konstantin Zhuravlyov2016-09-301-2/+3
| | | | | | | | instruction Differential Revision: https://reviews.llvm.org/D24985 llvm-svn: 282875
* AMDGPU: Remove implicit iterator conversions, NFCDuncan P. N. Exon Smith2016-07-081-1/+1
| | | | | | | | | | | Remove remaining implicit conversions from MachineInstrBundleIterator to MachineInstr* from the AMDGPU backend. In most cases, I made them less attractive by preferring MachineInstr& or using a ranged-based for loop. Once all the backends are fixed I'll make the operator explicit so that this doesn't bitrot back. llvm-svn: 274906
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-10/+8
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU/SI: Use the hazard recognizer to break SMEM soft clausesTom Stellard2016-05-021-2/+1
| | | | | | | | | | | | | | | Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260
* AMDGPU/SI: Use hazard recognizer to detect DPP hazardsTom Stellard2016-05-021-55/+0
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247
* AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaitsTom Stellard2016-04-301-6/+0
| | | | | | This was supposed to be part of r268143. llvm-svn: 268154
* AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsicNicolai Haehnle2016-04-271-13/+70
| | | | | | | | | | | | | | | | | Summary: So it appears that to guarantee some of the ordering requirements of a GLSL memoryBarrier() executed in the shader, we need to emit an s_waitcnt. (We can't use an s_barrier, because memoryBarrier() may appear anywhere in the shader, in particular it may appear in non-uniform control flow.) Reviewers: arsenm, mareko, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19203 llvm-svn: 267729
* AMDGPU/SI: Insert wait states required after v_readfirstlane on SITom Stellard2016-04-121-0/+6
| | | | | | | | | | | | | | | | | Summary: We will be able to handle this case much better once the hazard recognizer is finished, but this conservative implementation fixes a hang with the piglit test: spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra Reviewers: arsenm, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18988 llvm-svn: 266105
* AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStatesTom Stellard2016-04-071-1/+1
| | | | | | | | | | | | Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678
* AMDGPU/SI: Handle wait states required for DPP instructionsTom Stellard2016-03-141-0/+55
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 llvm-svn: 263447
* AMDGPU/SI: Incomplete shader binaries need to finish execution at the endMarek Olsak2016-03-141-8/+0
| | | | | | | | | | Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 llvm-svn: 263441
* AMDGPU: Simplify boolean conditional return statementsMatt Arsenault2016-03-021-4/+1
| | | | | | Patch by Richard Thomson llvm-svn: 262536
* AMDGPU: Fix bug 26659.Matt Arsenault2016-03-021-1/+1
| | | | | | | | Fix checking the same instruction twice instead of the second branch that uses vccz. I don't think this matters currently because s_branch_vccnz is always used currently. llvm-svn: 262457
* AMDGPU/SI: Fix s_waitcnt insertion for flat instructionsTom Stellard2016-02-191-2/+4
| | | | | | | | | | | | | | | | Summary: This was broken in r260694 which swapped the address and data operands for flat store instructions. The code in SIInsertWaits assumes that the data operand always comes before the address operand, so we need to add a special case for flat. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17366 llvm-svn: 261330
* AMDGPU/SI: Implement a work-around for smrd corrupting vccz bitTom Stellard2016-02-081-1/+55
| | | | | | | | | | | | | | Summary: We will hit this once we have enabled uniform branches. The smrd-vccz-bug.ll test will be added with the uniform branch commit. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16725 llvm-svn: 260137
* AMDGPU/SI: Correctly initialize SIInsertWaits passTom Stellard2016-02-051-5/+16
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16724 llvm-svn: 259894
* AMDGPU: waitcnt operand fixesTom Stellard2016-01-281-2/+2
| | | | | | | | | | | | | | | | Summary: Allow lgkmcnt up to 0xF (hardware allows that). Fix mask for ExpCnt in AMDGPUInstPrinter. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16314 Patch by: Nikolay Haustov llvm-svn: 259059
* AMDGPU/SI: Remove ending s_endpgm from non-void functionsMarek Olsak2016-01-131-0/+8
| | | | | | | | | | Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16035 llvm-svn: 257623
* AMDGPU/SI: Add s_waitcnt at the end of non-void functionsMarek Olsak2016-01-131-1/+7
| | | | | | | | | | | | | | Summary: v2: Make ReturnsVoid private, so that I can another 8 lines of code and look more productive. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16034 llvm-svn: 257622
* AMDGPU: Add MachineInstr overloads for instruction format testsMatt Arsenault2015-10-201-4/+4
| | | | llvm-svn: 250797
* AMDGPU: Fix unused variable warning in release buildMatt Arsenault2015-10-011-2/+2
| | | | llvm-svn: 249091
* AMDGPU: Make SIInsertWaits about a factor of 4 fasterMatt Arsenault2015-10-011-23/+22
| | | | | | | | | | | | | | | | | | This was the slowest target custom pass and was spending 80% of the time in getMinimalPhysRegClass which was called for every register operand. Try to use the statically known register class when possible from the instruction's MCOperandInfo. There are a few pseudo instructions which are not well behaved with unknown register classes which still require the expensive physical register class search. There are a few other possibilities for making this even faster, such as not inspecting implicit operands. For now those are checked because it is technically possible to have a scalar load into exec or vcc which can be implicitly used. llvm-svn: 249079
* AMDGPU: Fix recomputing dominator tree unnecessarilyMatt Arsenault2015-09-251-1/+5
| | | | | | | SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands. llvm-svn: 248587
* AMDGPU: Add s_dcache_* instructionsMatt Arsenault2015-09-241-6/+14
| | | | llvm-svn: 248533
* AMDGPU/SI: Better handle s_wait insertionTom Stellard2015-08-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755
* Fix some comment typos.Benjamin Kramer2015-08-081-1/+1
| | | | llvm-svn: 244402
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+480
llvm-svn: 239657
OpenPOWER on IntegriCloud