diff options
author | Ahmed Bougacha <ahmed.bougacha@gmail.com> | 2015-03-10 20:45:38 +0000 |
---|---|---|
committer | Ahmed Bougacha <ahmed.bougacha@gmail.com> | 2015-03-10 20:45:38 +0000 |
commit | fab5892f8b762a83d151976db4666895e5e4198b (patch) | |
tree | dfa567ccee7be507bb58fd24a55c5028292e48dd /libcxx/test/std/thread/thread.threads/thread.thread.class/thread.thread.constr | |
parent | e6cdf34116305bae21caeff1738625ce375bc196 (diff) | |
download | bcm5719-llvm-fab5892f8b762a83d151976db4666895e5e4198b.tar.gz bcm5719-llvm-fab5892f8b762a83d151976db4666895e5e4198b.zip |
[AArch64] Avoid going through GPRs for across-vector instructions.
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
(v4i32 (uaddv ...))
is the same as
(v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
(v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(i32 (int_aarch64_neon_uaddv ...)), ssub)
In a combine, we transform all such across-vector-lanes intrinsics to:
(i32 (extract_vector_elt (uaddv ...), 0))
This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs. Consider:
uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
return vmulq_n_u32(a, vaddvq_u32(b));
}
We now generate:
addv.4s s1, v1
mul.4s v0, v0, v1[0]
instead of the previous:
addv.4s s1, v1
fmov w8, s1
dup.4s v1, w8
mul.4s v0, v1, v0
rdar://20044838
llvm-svn: 231840
Diffstat (limited to 'libcxx/test/std/thread/thread.threads/thread.thread.class/thread.thread.constr')
0 files changed, 0 insertions, 0 deletions