diff options
| author | Ahmed Bougacha <ahmed.bougacha@gmail.com> | 2015-03-10 20:45:38 +0000 | 
|---|---|---|
| committer | Ahmed Bougacha <ahmed.bougacha@gmail.com> | 2015-03-10 20:45:38 +0000 | 
| commit | fab5892f8b762a83d151976db4666895e5e4198b (patch) | |
| tree | dfa567ccee7be507bb58fd24a55c5028292e48dd /libcxx/test/std/language.support/support.general | |
| parent | e6cdf34116305bae21caeff1738625ce375bc196 (diff) | |
| download | bcm5719-llvm-fab5892f8b762a83d151976db4666895e5e4198b.tar.gz bcm5719-llvm-fab5892f8b762a83d151976db4666895e5e4198b.zip | |
[AArch64] Avoid going through GPRs for across-vector instructions.
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
  (v4i32 (uaddv ...))
is the same as
  (v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
  (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
           (i32 (int_aarch64_neon_uaddv ...)), ssub)
In a combine, we transform all such across-vector-lanes intrinsics to:
  (i32 (extract_vector_elt (uaddv ...), 0))
This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs.  Consider:
    uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
        return vmulq_n_u32(a, vaddvq_u32(b));
    }
We now generate:
    addv.4s  s1, v1
    mul.4s   v0, v0, v1[0]
instead of the previous:
    addv.4s  s1, v1
    fmov     w8, s1
    dup.4s   v1, w8
    mul.4s   v0, v1, v0
rdar://20044838
llvm-svn: 231840
Diffstat (limited to 'libcxx/test/std/language.support/support.general')
0 files changed, 0 insertions, 0 deletions

