[AArch64] Avoid going through GPRs for across-vector instructions. - bcm5719-llvm

diff options

author	Ahmed Bougacha <ahmed.bougacha@gmail.com>	2015-03-10 20:45:38 +0000
committer	Ahmed Bougacha <ahmed.bougacha@gmail.com>	2015-03-10 20:45:38 +0000
commit	fab5892f8b762a83d151976db4666895e5e4198b (patch)
tree	dfa567ccee7be507bb58fd24a55c5028292e48dd /libcxx/test/std/thread/thread.threads/thread.thread.class/thread.thread.constr
parent	e6cdf34116305bae21caeff1738625ce375bc196 (diff)
download	bcm5719-llvm-fab5892f8b762a83d151976db4666895e5e4198b.tar.gz bcm5719-llvm-fab5892f8b762a83d151976db4666895e5e4198b.zip

[AArch64] Avoid going through GPRs for across-vector instructions.

This adds new node types for each intrinsic. For instance, for addv, we have AArch64ISD::UADDV, such that: (v4i32 (uaddv ...)) is the same as (v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...)))) that is, (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), (i32 (int_aarch64_neon_uaddv ...)), ssub) In a combine, we transform all such across-vector-lanes intrinsics to: (i32 (extract_vector_elt (uaddv ...), 0)) This has one big advantage: by making the extract_element explicit, we enable the existing patterns for lane-aware instructions to fire. This lets us avoid needlessly going through the GPRs. Consider: uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) { return vmulq_n_u32(a, vaddvq_u32(b)); } We now generate: addv.4s s1, v1 mul.4s v0, v0, v1[0] instead of the previous: addv.4s s1, v1 fmov w8, s1 dup.4s v1, w8 mul.4s v0, v1, v0 rdar://20044838 llvm-svn: 231840

Diffstat (limited to 'libcxx/test/std/thread/thread.threads/thread.thread.class/thread.thread.constr')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: