diff options
author | Adam Nemet <anemet@apple.com> | 2017-05-15 21:15:01 +0000 |
---|---|---|
committer | Adam Nemet <anemet@apple.com> | 2017-05-15 21:15:01 +0000 |
commit | e29686e5c17ea55536344d22df15fb58bb49b61f (patch) | |
tree | e363e4aa4bb8bffa44c57a91d37e14d03a6c404e /llvm/lib/Target/AArch64/AArch64Subtarget.cpp | |
parent | bd6e9e77a7941664196d4fdb1da76dd5519617b0 (diff) | |
download | bcm5719-llvm-e29686e5c17ea55536344d22df15fb58bb49b61f.tar.gz bcm5719-llvm-e29686e5c17ea55536344d22df15fb58bb49b61f.zip |
[SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits). This
is beneficial for example for 2D and 3D graphics. This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.
*** Performance Analysis
This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.
The results are with -O3 and PGO. A negative percentage is an improvement.
The testsuite was run with a sample size of 4.
** SPEC
* CFP2006/482.sphinx3 -3.34%
A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.
* CFP2000/177.mesa -3.34%
* CINT2000/256.bzip2 +6.97%
My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%. There are also other problems with the codegen in
this loop so there is further room for improvement.
** LLVM testsuite
* SingleSource/Benchmarks/Misc/ReedSolomon -10.75%
There are multiple small SLP vectorizations outside the hot code. It's a bit
surprising that it adds up to 10%. Some of this may be code-layout noise.
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%
The opt-viewer screenshot can be seen at F3218284. We start at a colder store
but the tree leads us into the hottest loop.
* MultiSource/Applications/lambda-0.1.3/lambda -2.68%
* MultiSource/Benchmarks/Bullet/bullet -2.18%
This is using 3D vectors.
* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%
Noise, binary is unchanged.
* MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90%
There is an additional SLP in the cold code. The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.
* MultiSource/Applications/aha/aha +1.63%
* MultiSource/Applications/JM/lencod/lencod +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark +1.15%
Differential Revision: https://reviews.llvm.org/D31965
llvm-svn: 303116
Diffstat (limited to 'llvm/lib/Target/AArch64/AArch64Subtarget.cpp')
-rw-r--r-- | llvm/lib/Target/AArch64/AArch64Subtarget.cpp | 8 |
1 files changed, 8 insertions, 0 deletions
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp index abdeac019a1..1c81d34014f 100644 --- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp +++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp @@ -91,6 +91,8 @@ void AArch64Subtarget::initializeProperties() { case Falkor: MaxInterleaveFactor = 4; VectorInsertExtractBaseCost = 2; + // FIXME: remove this to enable 64-bit SLP if performance looks good. + MinVectorRegisterBitWidth = 128; break; case Kryo: MaxInterleaveFactor = 4; @@ -99,6 +101,8 @@ void AArch64Subtarget::initializeProperties() { PrefetchDistance = 740; MinPrefetchStride = 1024; MaxPrefetchIterationsAhead = 11; + // FIXME: remove this to enable 64-bit SLP if performance looks good. + MinVectorRegisterBitWidth = 128; break; case ThunderX2T99: CacheLineSize = 64; @@ -108,6 +112,8 @@ void AArch64Subtarget::initializeProperties() { PrefetchDistance = 128; MinPrefetchStride = 1024; MaxPrefetchIterationsAhead = 4; + // FIXME: remove this to enable 64-bit SLP if performance looks good. + MinVectorRegisterBitWidth = 128; break; case ThunderX: case ThunderXT88: @@ -116,6 +122,8 @@ void AArch64Subtarget::initializeProperties() { CacheLineSize = 128; PrefFunctionAlignment = 3; PrefLoopAlignment = 2; + // FIXME: remove this to enable 64-bit SLP if performance looks good. + MinVectorRegisterBitWidth = 128; break; case CortexA35: break; case CortexA53: break; |