summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize/X86
diff options
context:
space:
mode:
authorSanjay Patel <spatel@rotateright.com>2019-11-27 13:33:11 -0500
committerSanjay Patel <spatel@rotateright.com>2019-11-27 14:08:56 -0500
commit5c166f1d1969e9c1e5b72aa672add429b9c22b53 (patch)
treeadf6302c8508cb2d3cf48fcf5e53eab409bfa65f /llvm/test/Transforms/LoopVectorize/X86
parent5c5e860535d8924a3d6eb950bb8a4945df01e9b7 (diff)
downloadbcm5719-llvm-5c166f1d1969e9c1e5b72aa672add429b9c22b53.tar.gz
bcm5719-llvm-5c166f1d1969e9c1e5b72aa672add429b9c22b53.zip
[x86] make SLM extract vector element more expensive than default
I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
Diffstat (limited to 'llvm/test/Transforms/LoopVectorize/X86')
-rw-r--r--llvm/test/Transforms/LoopVectorize/X86/interleaving.ll12
1 files changed, 6 insertions, 6 deletions
diff --git a/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll b/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
index 9294c92b575..f12f3570215 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
@@ -1,6 +1,6 @@
; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine < %s | FileCheck %s --check-prefix=NORMAL
-; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=slm < %s | FileCheck %s --check-prefix=NORMAL
-; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=atom < %s | FileCheck %s --check-prefix=ATOM
+; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=slm < %s | FileCheck %s --check-prefix=SLOW
+; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=atom < %s | FileCheck %s --check-prefix=SLOW
; NORMAL-LABEL: foo
; NORMAL: %[[WIDE:.*]] = load <8 x i32>, <8 x i32>* %{{.*}}, align 4
@@ -8,10 +8,10 @@
; NORMAL: %[[STRIDED2:.*]] = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
; NORMAL: add nsw <4 x i32> %[[STRIDED2]], %[[STRIDED1]]
-; ATOM-LABEL: foo
-; ATOM: load i32
-; ATOM: load i32
-; ATOM: store i32
+; SLOW-LABEL: foo
+; SLOW: load i32
+; SLOW: load i32
+; SLOW: store i32
define void @foo(i32* noalias nocapture %a, i32* noalias nocapture readonly %b) {
entry:
br label %for.body
OpenPOWER on IntegriCloud