diff options
author | Matt Arsenault <Matthew.Arsenault@amd.com> | 2017-01-24 22:18:39 +0000 |
---|---|---|
committer | Matt Arsenault <Matthew.Arsenault@amd.com> | 2017-01-24 22:18:39 +0000 |
commit | bf67cf7e4b42207e9e48b1de16d11c49a47279cc (patch) | |
tree | 6ebc099c1dcf5e95cf206a45b955b1040cdab276 /llvm/lib/Target/AMDGPU/SIInsertSkips.cpp | |
parent | f1cf0278e8a90628eea80ace88f54f9035f3730d (diff) | |
download | bcm5719-llvm-bf67cf7e4b42207e9e48b1de16d11c49a47279cc.tar.gz bcm5719-llvm-bf67cf7e4b42207e9e48b1de16d11c49a47279cc.zip |
AMDGPU: Remove spurious out branches after a kill
The sequence like this:
v_cmpx_le_f32_e32 vcc, 0, v0
s_branch BB0_30
s_cbranch_execnz BB0_30
; BB#29:
exp null off, off, off, off done vm
s_endpgm
BB0_30:
; %endif110
is likely wrong. The s_branch instruction will unconditionally jump
to BB0_30 and the skip block (exp done + endpgm) inserted for
performing the kill instruction will never be executed. This results
in a GPU hang with Star Ruler 2.
The s_branch instruction is added during the "Control Flow Optimizer"
pass which seems to re-organize the basic blocks, and we assume
that SI_KILL_TERMINATOR is always the last instruction inside a
basic block. Thus, after inserting a skip block we just go to the
next BB without looking at the subsequent instructions after the
kill, and the s_branch op is never removed.
Instead, we should remove the unconditional out branches and let
skip the two instructions if the exec mask is non-zero.
This patch fixes the GPU hang and doesn't introduce any regressions
with "make check".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 292985
Diffstat (limited to 'llvm/lib/Target/AMDGPU/SIInsertSkips.cpp')
-rw-r--r-- | llvm/lib/Target/AMDGPU/SIInsertSkips.cpp | 11 |
1 files changed, 9 insertions, 2 deletions
diff --git a/llvm/lib/Target/AMDGPU/SIInsertSkips.cpp b/llvm/lib/Target/AMDGPU/SIInsertSkips.cpp index c6b420fce8a..9d6feaa94fb 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertSkips.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertSkips.cpp @@ -263,6 +263,7 @@ bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) { BI != BE; BI = NextBB) { NextBB = std::next(BI); MachineBasicBlock &MBB = *BI; + bool HaveSkipBlock = false; if (!ExecBranchStack.empty() && ExecBranchStack.back() == &MBB) { // Reached convergence point for last divergent branch. @@ -290,8 +291,14 @@ bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) { case AMDGPU::S_BRANCH: // Optimize out branches to the next block. // FIXME: Shouldn't this be handled by BranchFolding? - if (MBB.isLayoutSuccessor(MI.getOperand(0).getMBB())) + if (MBB.isLayoutSuccessor(MI.getOperand(0).getMBB())) { MI.eraseFromParent(); + } else if (HaveSkipBlock) { + // Remove the given unconditional branch when a skip block has been + // inserted after the current one and let skip the two instructions + // performing the kill if the exec mask is non-zero. + MI.eraseFromParent(); + } break; case AMDGPU::SI_KILL_TERMINATOR: @@ -300,9 +307,9 @@ bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) { if (ExecBranchStack.empty()) { if (skipIfDead(MI, *NextBB)) { + HaveSkipBlock = true; NextBB = std::next(BI); BE = MF.end(); - Next = MBB.end(); } } else { HaveKill = true; |