| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.
To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.
This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:
..
dlstp.32 lr, r2
.LBB0_1:
mov r3, r2
subs r2, #4
vldrh.u32 q2, [r1], #8
vmov q1, q0
vmla.u32 q0, q2, r0
letp lr, .LBB0_1
@ %bb.2:
vctp.32 r3
..
which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.
Differential Revision: https://reviews.llvm.org/D71007
|
|
|
|
|
|
|
| |
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.
Differential Revision: https://reviews.llvm.org/D70968
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.
To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.
This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
legality of masked operations as well as normal ones. This array is
fairly small, so doubling the size still won't make it very large.
Offset masked loads can then be controlled with
setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
the same way.
- The ARM backend is then adjusted to make use of these indexed masked
loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.
Differential Revision: https://reviews.llvm.org/D70176
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add some more helper functions to ReachingDefs to query the uses of
a given MachineInstr and also to query whether two MachineInstrs use
the same def of a register.
For Arm, while tail-predicating, these helpers are used in the
low-overhead loops to remove the dead code that calculates the number
of loop iterations.
Differential Revision: https://reviews.llvm.org/D70240
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies ARMLowOverheadLoops to convert a predicated
vector low-overhead loop into a tail-predicatd one. This is currently
a very basic conversion, with the following restrictions:
- Operates only on single block loops.
- The loop can only contain a single vctp instruction.
- No other instructions can write to the vpr.
- We only allow a subset of the mve instructions in the loop.
TODO: Pass the number of elements, not the number of iterations to
dlstp/wlstp.
Differential Revision: https://reviews.llvm.org/D69945
|
|
|
|
|
|
|
|
| |
Unlike VPT, VPST just uses the current value of VPR.P0.
Differential Revision: https://reviews.llvm.org/D69037
llvm-svn: 375087
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VCTP instruction will calculate the predicate masked based upon
the number of elements that need to be processed. I had inserted the
sub before the vctp intrinsic and supplied it as the operand, but
this is incorrect as the phi should directly feed the vctp. The sub
is calculating the value for the next iteration.
Differential Revision: https://reviews.llvm.org/D67921
llvm-svn: 373188
|
|
Remove any predicate that we replace with a vctp intrinsic, and try
to remove their operands too. Also look into the exit block to see if
there's any duplicates of the predicates that we've replaced and
clone the vctp to be used there instead.
Differential Revision: https://reviews.llvm.org/D67709
llvm-svn: 372567
|