diff options
author | Matt Arsenault <Matthew.Arsenault@amd.com> | 2016-04-25 19:53:22 +0000 |
---|---|---|
committer | Matt Arsenault <Matthew.Arsenault@amd.com> | 2016-04-25 19:53:22 +0000 |
commit | 074ea2851c4a4c5afeba2390d905eca062d66096 (patch) | |
tree | 1dcaeb0b4c4147d9464a9bc8452dfa6d5655c63d /llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp | |
parent | 9ba19182be2cb43d84c2b46b2e0858be5fa51329 (diff) | |
download | bcm5719-llvm-074ea2851c4a4c5afeba2390d905eca062d66096.tar.gz bcm5719-llvm-074ea2851c4a4c5afeba2390d905eca062d66096.zip |
AMDGPU/SI: Optimize adjacent s_nop instructions
Use the operand for how long to wait. This is somewhat
distasteful, since it would be better to just emit s_nop
with the right argument in the first place. This would require
changing TII::insertNoop to emit N operands, which would be easy.
Slightly more problematic is the post-RA scheduler and hazard recognizer
represent nops as a single null node, and would require inventing
another way of representing N nops.
llvm-svn: 267456
Diffstat (limited to 'llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp')
-rw-r--r-- | llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp | 27 |
1 files changed, 27 insertions, 0 deletions
diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp index ad3f63d2cea..346488d38e2 100644 --- a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp +++ b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp @@ -242,6 +242,33 @@ bool SIShrinkInstructions::runOnMachineFunction(MachineFunction &MF) { } } + // Combine adjacent s_nops to use the immediate operand encoding how long + // to wait. + // + // s_nop N + // s_nop M + // => + // s_nop (N + M) + if (MI.getOpcode() == AMDGPU::S_NOP && + Next != MBB.end() && + (*Next).getOpcode() == AMDGPU::S_NOP) { + + MachineInstr &NextMI = *Next; + // The instruction encodes the amount to wait with an offset of 1, + // i.e. 0 is wait 1 cycle. Convert both to cycles and then convert back + // after adding. + uint8_t Nop0 = MI.getOperand(0).getImm() + 1; + uint8_t Nop1 = NextMI.getOperand(0).getImm() + 1; + + // Make sure we don't overflow the bounds. + if (Nop0 + Nop1 <= 8) { + NextMI.getOperand(0).setImm(Nop0 + Nop1 - 1); + MI.eraseFromParent(); + } + + continue; + } + // FIXME: We also need to consider movs of constant operands since // immediate operands are not folded if they have more than one use, and // the operand folding pass is unaware if the immediate will be free since |