diff options
author | Tom Stellard <thomas.stellard@amd.com> | 2015-08-21 22:47:27 +0000 |
---|---|---|
committer | Tom Stellard <thomas.stellard@amd.com> | 2015-08-21 22:47:27 +0000 |
commit | bd8a0856e2f959d313f8a8888ca468f02e6e4173 (patch) | |
tree | 48caf75cdd78caa7034c270a6f14ddf2ccc8b598 /llvm/lib/Target/AMDGPU/SIInsertWaits.cpp | |
parent | 540ac1aab4b48393e6b7e2bbe1ec020fa792e916 (diff) | |
download | bcm5719-llvm-bd8a0856e2f959d313f8a8888ca468f02e6e4173.tar.gz bcm5719-llvm-bd8a0856e2f959d313f8a8888ca468f02e6e4173.zip |
AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.
Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.
Here's an example of subtle perf reduction this patch solves:
This is without the patch:
buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen
The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.
Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.
Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.
Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.
Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.
Patch by: Axel Davy
Differential Revision: http://reviews.llvm.org/D11883
llvm-svn: 245755
Diffstat (limited to 'llvm/lib/Target/AMDGPU/SIInsertWaits.cpp')
-rw-r--r-- | llvm/lib/Target/AMDGPU/SIInsertWaits.cpp | 7 |
1 files changed, 5 insertions, 2 deletions
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp index df76b457af3..2379b1fcf6a 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp @@ -246,10 +246,13 @@ void SIInsertWaits::pushInstruction(MachineBasicBlock &MBB, // Get the hardware counter increments and sum them up Counters Increment = getHwCounts(*I); + Counters Limit = ZeroCounts; unsigned Sum = 0; for (unsigned i = 0; i < 3; ++i) { LastIssued.Array[i] += Increment.Array[i]; + if (Increment.Array[i]) + Limit.Array[i] = LastIssued.Array[i]; Sum += Increment.Array[i]; } @@ -300,11 +303,11 @@ void SIInsertWaits::pushInstruction(MachineBasicBlock &MBB, // Remember which registers we define if (Op.isDef()) - DefinedRegs[j] = LastIssued; + DefinedRegs[j] = Limit; // and which one we are using if (Op.isUse()) - UsedRegs[j] = LastIssued; + UsedRegs[j] = Limit; } } } |