[SLP] Enable 64-bit wide vectorization on AArch64

ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. *** Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. ** SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
author: Adam Nemet <anemet@apple.com> 2017-05-15 21:15:01 +0000
committer: Adam Nemet <anemet@apple.com> 2017-05-15 21:15:01 +0000
commit: e29686e5c17ea55536344d22df15fb58bb49b61f (patch)
tree: e363e4aa4bb8bffa44c57a91d37e14d03a6c404e /llvm/lib/Target/AArch64/AArch64Subtarget.cpp
parent: bd6e9e77a7941664196d4fdb1da76dd5519617b0 (diff)
download: bcm5719-llvm-e29686e5c17ea55536344d22df15fb58bb49b61f.tar.gz
bcm5719-llvm-e29686e5c17ea55536344d22df15fb58bb49b61f.zip
1 files changed, 8 insertions, 0 deletions
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index abdeac019a1..1c81d34014f 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -91,6 +91,8 @@ void AArch64Subtarget::initializeProperties() {
   case Falkor:
     MaxInterleaveFactor = 4;
     VectorInsertExtractBaseCost = 2;
+    // FIXME: remove this to enable 64-bit SLP if performance looks good.
+    MinVectorRegisterBitWidth = 128;
     break;
   case Kryo:
     MaxInterleaveFactor = 4;
@@ -99,6 +101,8 @@ void AArch64Subtarget::initializeProperties() {
     PrefetchDistance = 740;
     MinPrefetchStride = 1024;
     MaxPrefetchIterationsAhead = 11;
+    // FIXME: remove this to enable 64-bit SLP if performance looks good.
+    MinVectorRegisterBitWidth = 128;
     break;
   case ThunderX2T99:
     CacheLineSize = 64;
@@ -108,6 +112,8 @@ void AArch64Subtarget::initializeProperties() {
     PrefetchDistance = 128;
     MinPrefetchStride = 1024;
     MaxPrefetchIterationsAhead = 4;
+    // FIXME: remove this to enable 64-bit SLP if performance looks good.
+    MinVectorRegisterBitWidth = 128;
     break;
   case ThunderX:
   case ThunderXT88:
@@ -116,6 +122,8 @@ void AArch64Subtarget::initializeProperties() {
     CacheLineSize = 128;
     PrefFunctionAlignment = 3;
     PrefLoopAlignment = 2;
+    // FIXME: remove this to enable 64-bit SLP if performance looks good.
+    MinVectorRegisterBitWidth = 128;
     break;
   case CortexA35: break;
   case CortexA53: break;
author	Adam Nemet <anemet@apple.com>	2017-05-15 21:15:01 +0000
committer	Adam Nemet <anemet@apple.com>	2017-05-15 21:15:01 +0000
commit	e29686e5c17ea55536344d22df15fb58bb49b61f (patch)
tree	e363e4aa4bb8bffa44c57a91d37e14d03a6c404e /llvm/lib/Target/AArch64/AArch64Subtarget.cpp
parent	bd6e9e77a7941664196d4fdb1da76dd5519617b0 (diff)
download	bcm5719-llvm-e29686e5c17ea55536344d22df15fb58bb49b61f.tar.gz bcm5719-llvm-e29686e5c17ea55536344d22df15fb58bb49b61f.zip