AMDGPU/SI: Better handle s_wait insertion

We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755
author: Tom Stellard <thomas.stellard@amd.com> 2015-08-21 22:47:27 +0000
committer: Tom Stellard <thomas.stellard@amd.com> 2015-08-21 22:47:27 +0000
commit: bd8a0856e2f959d313f8a8888ca468f02e6e4173 (patch)
tree: 48caf75cdd78caa7034c270a6f14ddf2ccc8b598 /llvm/lib/Target/AMDGPU/SIInsertWaits.cpp
parent: 540ac1aab4b48393e6b7e2bbe1ec020fa792e916 (diff)
download: bcm5719-llvm-bd8a0856e2f959d313f8a8888ca468f02e6e4173.tar.gz
bcm5719-llvm-bd8a0856e2f959d313f8a8888ca468f02e6e4173.zip
1 files changed, 5 insertions, 2 deletions
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp
index df76b457af3..2379b1fcf6a 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp
@@ -246,10 +246,13 @@ void SIInsertWaits::pushInstruction(MachineBasicBlock &MBB,
 
   // Get the hardware counter increments and sum them up
   Counters Increment = getHwCounts(*I);
+  Counters Limit = ZeroCounts;
   unsigned Sum = 0;
 
   for (unsigned i = 0; i < 3; ++i) {
     LastIssued.Array[i] += Increment.Array[i];
+    if (Increment.Array[i])
+      Limit.Array[i] = LastIssued.Array[i];
     Sum += Increment.Array[i];
   }
 
@@ -300,11 +303,11 @@ void SIInsertWaits::pushInstruction(MachineBasicBlock &MBB,
 
       // Remember which registers we define
       if (Op.isDef())
-        DefinedRegs[j] = LastIssued;
+        DefinedRegs[j] = Limit;
 
       // and which one we are using
       if (Op.isUse())
-        UsedRegs[j] = LastIssued;
+        UsedRegs[j] = Limit;
     }
   }
 }
author	Tom Stellard <thomas.stellard@amd.com>	2015-08-21 22:47:27 +0000
committer	Tom Stellard <thomas.stellard@amd.com>	2015-08-21 22:47:27 +0000
commit	bd8a0856e2f959d313f8a8888ca468f02e6e4173 (patch)
tree	48caf75cdd78caa7034c270a6f14ddf2ccc8b598 /llvm/lib/Target/AMDGPU/SIInsertWaits.cpp
parent	540ac1aab4b48393e6b7e2bbe1ec020fa792e916 (diff)
download	bcm5719-llvm-bd8a0856e2f959d313f8a8888ca468f02e6e4173.tar.gz bcm5719-llvm-bd8a0856e2f959d313f8a8888ca468f02e6e4173.zip