ARM: Improve codegen for vget_low_* and vget_high_ intrinsics.

These intrinsics use the __builtin_shuffle() function to extract the low and high half, respectively, of a 128-bit NEON vector. Currently, they're defined to use bitcasts to simplify the emitter, so we get code like: uint16x4_t vget_low_u32(uint16x8_t __a) { return (uint32x2_t) __builtin_shufflevector((int64x2_t) __a, (int64x2_t) __a, 0); } While this works, it results in those bitcasts going all the way through to the IR, resulting in code like: %1 = bitcast <8 x i16> %in to <2 x i64> %2 = shufflevector <2 x i64> %1, <2 x i64> undef, <1 x i32> %zeroinitializer %3 = bitcast <1 x i64> %2 to <4 x i16> We can instead easily perform the operation directly on the input vector like: uint16x4_t vget_low_u16(uint16x8_t __a) { return __builtin_shufflevector(__a, __a, 0, 1, 2, 3); } Not only is that much easier to read on its own, it also results in cleaner IR like: %1 = shufflevector <8 x i16> %in, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> This is both easier to read and easier for the back end to reason about effectively since the operation is obfuscating the source with bitcasts. rdar://13894163 llvm-svn: 181865
author: Jim Grosbach <grosbach@apple.com> 2013-05-15 02:40:04 +0000
committer: Jim Grosbach <grosbach@apple.com> 2013-05-15 02:40:04 +0000
commit: d10f1c04aad0a845f4fb54bb7ec0c6568bd56753 (patch)
tree: 51c5a1bb2294f79bcffa381cb8dc8b238fb433d9 /clang/utils/TableGen/NeonEmitter.cpp
parent: 2006ba945f311b8e01fab6384026d1c01a238c95 (diff)
download: bcm5719-llvm-d10f1c04aad0a845f4fb54bb7ec0c6568bd56753.tar.gz
bcm5719-llvm-d10f1c04aad0a845f4fb54bb7ec0c6568bd56753.zip
1 files changed, 9 insertions, 4 deletions
diff --git a/clang/utils/TableGen/NeonEmitter.cpp b/clang/utils/TableGen/NeonEmitter.cpp
index 34b955e8e9d..05505c99c99 100644
--- a/clang/utils/TableGen/NeonEmitter.cpp
+++ b/clang/utils/TableGen/NeonEmitter.cpp
@@ -1410,12 +1410,17 @@ static std::string GenOpString(OpKind op, const std::string &proto,
     s += ", (int64x1_t)__b, 0, 1);";
     break;
   case OpHi:
-    s += "(" + ts +
-      ")__builtin_shufflevector((int64x2_t)__a, (int64x2_t)__a, 1);";
+    // nElts is for the result vector, so the source is twice that number.
+    s += "__builtin_shufflevector(__a, __a";
+    for (unsigned i = nElts; i < nElts * 2; ++i)
+      s += ", " + utostr(i);
+    s+= ");";
     break;
   case OpLo:
-    s += "(" + ts +
-      ")__builtin_shufflevector((int64x2_t)__a, (int64x2_t)__a, 0);";
+    s += "__builtin_shufflevector(__a, __a";
+    for (unsigned i = 0; i < nElts; ++i)
+      s += ", " + utostr(i);
+    s+= ");";
     break;
   case OpDup:
     s += Duplicate(nElts, typestr, "__a") + ";";
author	Jim Grosbach <grosbach@apple.com>	2013-05-15 02:40:04 +0000
committer	Jim Grosbach <grosbach@apple.com>	2013-05-15 02:40:04 +0000
commit	d10f1c04aad0a845f4fb54bb7ec0c6568bd56753 (patch)
tree	51c5a1bb2294f79bcffa381cb8dc8b238fb433d9 /clang/utils/TableGen/NeonEmitter.cpp
parent	2006ba945f311b8e01fab6384026d1c01a238c95 (diff)
download	bcm5719-llvm-d10f1c04aad0a845f4fb54bb7ec0c6568bd56753.tar.gz bcm5719-llvm-d10f1c04aad0a845f4fb54bb7ec0c6568bd56753.zip