AMDGPU/SI: Optimize adjacent s_nop instructions

Use the operand for how long to wait. This is somewhat distasteful, since it would be better to just emit s_nop with the right argument in the first place. This would require changing TII::insertNoop to emit N operands, which would be easy. Slightly more problematic is the post-RA scheduler and hazard recognizer represent nops as a single null node, and would require inventing another way of representing N nops. llvm-svn: 267456
author: Matt Arsenault <Matthew.Arsenault@amd.com> 2016-04-25 19:53:22 +0000
committer: Matt Arsenault <Matthew.Arsenault@amd.com> 2016-04-25 19:53:22 +0000
commit: 074ea2851c4a4c5afeba2390d905eca062d66096 (patch)
tree: 1dcaeb0b4c4147d9464a9bc8452dfa6d5655c63d /llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
parent: 9ba19182be2cb43d84c2b46b2e0858be5fa51329 (diff)
download: bcm5719-llvm-074ea2851c4a4c5afeba2390d905eca062d66096.tar.gz
bcm5719-llvm-074ea2851c4a4c5afeba2390d905eca062d66096.zip
1 files changed, 27 insertions, 0 deletions
diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
index ad3f63d2cea..346488d38e2 100644
--- a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
+++ b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
@@ -242,6 +242,33 @@ bool SIShrinkInstructions::runOnMachineFunction(MachineFunction &MF) {
         }
       }
 
+      // Combine adjacent s_nops to use the immediate operand encoding how long
+      // to wait.
+      //
+      // s_nop N
+      // s_nop M
+      //  =>
+      // s_nop (N + M)
+      if (MI.getOpcode() == AMDGPU::S_NOP &&
+          Next != MBB.end() &&
+          (*Next).getOpcode() == AMDGPU::S_NOP) {
+
+        MachineInstr &NextMI = *Next;
+        // The instruction encodes the amount to wait with an offset of 1,
+        // i.e. 0 is wait 1 cycle. Convert both to cycles and then convert back
+        // after adding.
+        uint8_t Nop0 = MI.getOperand(0).getImm() + 1;
+        uint8_t Nop1 = NextMI.getOperand(0).getImm() + 1;
+
+        // Make sure we don't overflow the bounds.
+        if (Nop0 + Nop1 <= 8) {
+          NextMI.getOperand(0).setImm(Nop0 + Nop1 - 1);
+          MI.eraseFromParent();
+        }
+
+        continue;
+      }
+
       // FIXME: We also need to consider movs of constant operands since
       // immediate operands are not folded if they have more than one use, and
       // the operand folding pass is unaware if the immediate will be free since
author	Matt Arsenault <Matthew.Arsenault@amd.com>	2016-04-25 19:53:22 +0000
committer	Matt Arsenault <Matthew.Arsenault@amd.com>	2016-04-25 19:53:22 +0000
commit	074ea2851c4a4c5afeba2390d905eca062d66096 (patch)
tree	1dcaeb0b4c4147d9464a9bc8452dfa6d5655c63d /llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
parent	9ba19182be2cb43d84c2b46b2e0858be5fa51329 (diff)
download	bcm5719-llvm-074ea2851c4a4c5afeba2390d905eca062d66096.tar.gz bcm5719-llvm-074ea2851c4a4c5afeba2390d905eca062d66096.zip