[SLP] respect target register width for GEP vectorization (PR43578)

We failed to account for the target register width (max vector factor) when vectorizing starting from GEPs. This causes vectorization to proceed to obviously illegal widths as in: https://bugs.llvm.org/show_bug.cgi?id=43578 For x86, this also means that SLP can produce rogue AVX or AVX512 code even when the user specifies a narrower vector width. The AArch64 test in ext-trunc.ll appears to be better using the narrower width. I'm not exactly sure what getelementptr.ll is trying to do, but it's testing with "-slp-threshold=-18", so I'm not worried about those diffs. The x86 test is an over-reduction from SPEC h264; this patch appears to restore the perf loss caused by SLP when using -march=haswell. Differential Revision: https://reviews.llvm.org/D68667 llvm-svn: 374183
author: Sanjay Patel <spatel@rotateright.com> 2019-10-09 16:32:49 +0000
committer: Sanjay Patel <spatel@rotateright.com> 2019-10-09 16:32:49 +0000
commit: df14bd315db94d286c0c75b4b6ee5d760f311399 (patch)
tree: 30217dada42bb2f1929d4b60fe6cdda6232f652e /llvm/lib/Transforms/Vectorize
parent: d037a5f06538f658f451c5dad106ec02f3cc56c2 (diff)
download: bcm5719-llvm-df14bd315db94d286c0c75b4b6ee5d760f311399.tar.gz
bcm5719-llvm-df14bd315db94d286c0c75b4b6ee5d760f311399.zip
1 files changed, 10 insertions, 4 deletions
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 99428c6c5de..75b4718392d 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -6981,10 +6981,16 @@ bool SLPVectorizerPass::vectorizeGEPIndices(BasicBlock *BB, BoUpSLP &R) {
     LLVM_DEBUG(dbgs() << "SLP: Analyzing a getelementptr list of length "
                       << Entry.second.size() << ".\n");
 
-    // We process the getelementptr list in chunks of 16 (like we do for
-    // stores) to minimize compile-time.
-    for (unsigned BI = 0, BE = Entry.second.size(); BI < BE; BI += 16) {
-      auto Len = std::min<unsigned>(BE - BI, 16);
+    // Process the GEP list in chunks suitable for the target's supported
+    // vector size. If a vector register can't hold 1 element, we are done.
+    unsigned MaxVecRegSize = R.getMaxVecRegSize();
+    unsigned EltSize = R.getVectorElementSize(Entry.second[0]);
+    if (MaxVecRegSize < EltSize)
+      continue;
+
+    unsigned MaxElts = MaxVecRegSize / EltSize;
+    for (unsigned BI = 0, BE = Entry.second.size(); BI < BE; BI += MaxElts) {
+      auto Len = std::min<unsigned>(BE - BI, MaxElts);
       auto GEPList = makeArrayRef(&Entry.second[BI], Len);
 
       // Initialize a set a candidate getelementptrs. Note that we use a
author	Sanjay Patel <spatel@rotateright.com>	2019-10-09 16:32:49 +0000
committer	Sanjay Patel <spatel@rotateright.com>	2019-10-09 16:32:49 +0000
commit	df14bd315db94d286c0c75b4b6ee5d760f311399 (patch)
tree	30217dada42bb2f1929d4b60fe6cdda6232f652e /llvm/lib/Transforms/Vectorize
parent	d037a5f06538f658f451c5dad106ec02f3cc56c2 (diff)
download	bcm5719-llvm-df14bd315db94d286c0c75b4b6ee5d760f311399.tar.gz bcm5719-llvm-df14bd315db94d286c0c75b4b6ee5d760f311399.zip