diff options
author | Taewook Oh <twoh@fb.com> | 2017-06-29 23:11:24 +0000 |
---|---|---|
committer | Taewook Oh <twoh@fb.com> | 2017-06-29 23:11:24 +0000 |
commit | 0e35ea3b7c63f43d27f505fac23bfa9b0c706c4b (patch) | |
tree | 365157739d9b4f6ebde359c4023714c05bf77894 /llvm/lib/CodeGen/TwoAddressInstructionPass.cpp | |
parent | b13eebe0cee4f0883c11bd98e2fe9be03a344c0a (diff) | |
download | bcm5719-llvm-0e35ea3b7c63f43d27f505fac23bfa9b0c706c4b.tar.gz bcm5719-llvm-0e35ea3b7c63f43d27f505fac23bfa9b0c706c4b.zip |
Remove redundant copy in recurrences
Summary:
If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code,
```
BB#1: ; Loop Header
%vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
...
BB#6: ; Loop Latch
%vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
%vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
%vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10
CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
%vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
JL_1 <BB#1>, %EFLAGS<imp-use,kill>
```
Existing two-address generation pass generates following code:
```
BB#1:
%vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
...
BB#6:
Predecessors according to CFG: BB#5 BB#4
%vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
%vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1
%vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0
%vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
%vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
%vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
JL_1 <BB#1>, %EFLAGS<imp-use,kill>
JMP_1 <BB#7>
```
This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1:
```
.LBB0_6:
addl %esi, %edi
addl %ebx, %edi
cmpl $10, %edi
movl %edi, %esi
jl .LBB0_1
```
This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes
```
BB#1:
%vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
...
BB#6: derived from LLVM BB %bb7
Predecessors according to CFG: BB#5 BB#4
%vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
%vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0
%vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1
%vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
%vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
%vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
JL_1 <BB#1>, %EFLAGS<imp-use,kill>
JMP_1 <BB#7>
```
and the final assembly does not have redundant copy:
```
.LBB0_6:
addl %edi, %eax
addl %ebx, %eax
cmpl $10, %eax
jl .LBB0_1
```
Reviewers: qcolombet, MatzeB, wmi
Reviewed By: wmi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31821
llvm-svn: 306758
Diffstat (limited to 'llvm/lib/CodeGen/TwoAddressInstructionPass.cpp')
-rw-r--r-- | llvm/lib/CodeGen/TwoAddressInstructionPass.cpp | 11 |
1 files changed, 9 insertions, 2 deletions
diff --git a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp index 552a89f76ca..83c00e24d14 100644 --- a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp +++ b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp @@ -68,6 +68,13 @@ EnableRescheduling("twoaddr-reschedule", cl::desc("Coalesce copies by rescheduling (default=true)"), cl::init(true), cl::Hidden); +// Limit the number of dataflow edges to traverse when evaluating the benefit +// of commuting operands. +static cl::opt<unsigned> MaxDataFlowEdge( + "dataflow-edge-limit", cl::Hidden, cl::init(3), + cl::desc("Maximum number of dataflow edges to traverse when evaluating " + "the benefit of commuting operands")); + namespace { class TwoAddressInstructionPass : public MachineFunctionPass { MachineFunction *MF; @@ -637,10 +644,10 @@ isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC, // To more generally minimize register copies, ideally the logic of two addr // instruction pass should be integrated with register allocation pass where // interference graph is available. - if (isRevCopyChain(regC, regA, 3)) + if (isRevCopyChain(regC, regA, MaxDataFlowEdge)) return true; - if (isRevCopyChain(regB, regA, 3)) + if (isRevCopyChain(regB, regA, MaxDataFlowEdge)) return false; // Since there are no intervening uses for both registers, then commute |