AMDGPU: Remove spurious out branches after a kill

The sequence like this: v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019 Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 292985
author: Matt Arsenault <Matthew.Arsenault@amd.com> 2017-01-24 22:18:39 +0000
committer: Matt Arsenault <Matthew.Arsenault@amd.com> 2017-01-24 22:18:39 +0000
commit: bf67cf7e4b42207e9e48b1de16d11c49a47279cc (patch)
tree: 6ebc099c1dcf5e95cf206a45b955b1040cdab276 /llvm/lib/Target
parent: f1cf0278e8a90628eea80ace88f54f9035f3730d (diff)
download: bcm5719-llvm-bf67cf7e4b42207e9e48b1de16d11c49a47279cc.tar.gz
bcm5719-llvm-bf67cf7e4b42207e9e48b1de16d11c49a47279cc.zip
1 files changed, 9 insertions, 2 deletions
diff --git a/llvm/lib/Target/AMDGPU/SIInsertSkips.cpp b/llvm/lib/Target/AMDGPU/SIInsertSkips.cpp
index c6b420fce8a..9d6feaa94fb 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertSkips.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertSkips.cpp
@@ -263,6 +263,7 @@ bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) {
        BI != BE; BI = NextBB) {
     NextBB = std::next(BI);
     MachineBasicBlock &MBB = *BI;
+    bool HaveSkipBlock = false;
 
     if (!ExecBranchStack.empty() && ExecBranchStack.back() == &MBB) {
       // Reached convergence point for last divergent branch.
@@ -290,8 +291,14 @@ bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) {
       case AMDGPU::S_BRANCH:
         // Optimize out branches to the next block.
         // FIXME: Shouldn't this be handled by BranchFolding?
-        if (MBB.isLayoutSuccessor(MI.getOperand(0).getMBB()))
+        if (MBB.isLayoutSuccessor(MI.getOperand(0).getMBB())) {
           MI.eraseFromParent();
+        } else if (HaveSkipBlock) {
+          // Remove the given unconditional branch when a skip block has been
+          // inserted after the current one and let skip the two instructions
+          // performing the kill if the exec mask is non-zero.
+          MI.eraseFromParent();
+        }
         break;
 
       case AMDGPU::SI_KILL_TERMINATOR:
@@ -300,9 +307,9 @@ bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) {
 
         if (ExecBranchStack.empty()) {
           if (skipIfDead(MI, *NextBB)) {
+            HaveSkipBlock = true;
             NextBB = std::next(BI);
             BE = MF.end();
-            Next = MBB.end();
           }
         } else {
           HaveKill = true;
author	Matt Arsenault <Matthew.Arsenault@amd.com>	2017-01-24 22:18:39 +0000
committer	Matt Arsenault <Matthew.Arsenault@amd.com>	2017-01-24 22:18:39 +0000
commit	bf67cf7e4b42207e9e48b1de16d11c49a47279cc (patch)
tree	6ebc099c1dcf5e95cf206a45b955b1040cdab276 /llvm/lib/Target
parent	f1cf0278e8a90628eea80ace88f54f9035f3730d (diff)
download	bcm5719-llvm-bf67cf7e4b42207e9e48b1de16d11c49a47279cc.tar.gz bcm5719-llvm-bf67cf7e4b42207e9e48b1de16d11c49a47279cc.zip