[x86] make SLM extract vector element more expensive than default

I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
author: Sanjay Patel <spatel@rotateright.com> 2019-11-27 13:33:11 -0500
committer: Sanjay Patel <spatel@rotateright.com> 2019-11-27 14:08:56 -0500
commit: 5c166f1d1969e9c1e5b72aa672add429b9c22b53 (patch)
tree: adf6302c8508cb2d3cf48fcf5e53eab409bfa65f /llvm/test/Transforms/LoopVectorize/X86
parent: 5c5e860535d8924a3d6eb950bb8a4945df01e9b7 (diff)
download: bcm5719-llvm-5c166f1d1969e9c1e5b72aa672add429b9c22b53.tar.gz
bcm5719-llvm-5c166f1d1969e9c1e5b72aa672add429b9c22b53.zip
1 files changed, 6 insertions, 6 deletions
diff --git a/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll b/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
index 9294c92b575..f12f3570215 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
@@ -1,6 +1,6 @@
 ; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine < %s | FileCheck %s --check-prefix=NORMAL
-; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=slm < %s | FileCheck %s --check-prefix=NORMAL
-; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=atom < %s | FileCheck %s --check-prefix=ATOM
+; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=slm < %s | FileCheck %s --check-prefix=SLOW
+; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=atom < %s | FileCheck %s --check-prefix=SLOW
 
 ; NORMAL-LABEL: foo
 ; NORMAL: %[[WIDE:.*]] = load <8 x i32>, <8 x i32>* %{{.*}}, align 4
@@ -8,10 +8,10 @@
 ; NORMAL: %[[STRIDED2:.*]] = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
 ; NORMAL: add nsw <4 x i32> %[[STRIDED2]], %[[STRIDED1]]
 
-; ATOM-LABEL: foo
-; ATOM: load i32
-; ATOM: load i32
-; ATOM: store i32
+; SLOW-LABEL: foo
+; SLOW: load i32
+; SLOW: load i32
+; SLOW: store i32
 define void @foo(i32* noalias nocapture %a, i32* noalias nocapture readonly %b) {
 entry:
   br label %for.body
author	Sanjay Patel <spatel@rotateright.com>	2019-11-27 13:33:11 -0500
committer	Sanjay Patel <spatel@rotateright.com>	2019-11-27 14:08:56 -0500
commit	5c166f1d1969e9c1e5b72aa672add429b9c22b53 (patch)
tree	adf6302c8508cb2d3cf48fcf5e53eab409bfa65f /llvm/test/Transforms/LoopVectorize/X86
parent	5c5e860535d8924a3d6eb950bb8a4945df01e9b7 (diff)
download	bcm5719-llvm-5c166f1d1969e9c1e5b72aa672add429b9c22b53.tar.gz bcm5719-llvm-5c166f1d1969e9c1e5b72aa672add429b9c22b53.zip