diff options
| author | Sanjay Patel <spatel@rotateright.com> | 2019-11-27 13:33:11 -0500 |
|---|---|---|
| committer | Sanjay Patel <spatel@rotateright.com> | 2019-11-27 14:08:56 -0500 |
| commit | 5c166f1d1969e9c1e5b72aa672add429b9c22b53 (patch) | |
| tree | adf6302c8508cb2d3cf48fcf5e53eab409bfa65f /llvm/test/Transforms/LoopVectorize/X86 | |
| parent | 5c5e860535d8924a3d6eb950bb8a4945df01e9b7 (diff) | |
| download | bcm5719-llvm-5c166f1d1969e9c1e5b72aa672add429b9c22b53.tar.gz bcm5719-llvm-5c166f1d1969e9c1e5b72aa672add429b9c22b53.zip | |
[x86] make SLM extract vector element more expensive than default
I'm not sure what the effect of this change will be on all of the affected
tests or a larger benchmark, but it fixes the horizontal add/sub problems
noted here:
https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc
The costs are based on reciprocal throughput numbers in Agner's tables for
PEXTR*; these appear to be very slow ops on Silvermont.
This is a small step towards the larger motivation discussed in PR43605:
https://bugs.llvm.org/show_bug.cgi?id=43605
Also, it seems likely that insert/extract is the source of perf regressions on
other CPUs (up to 30%) that were cited as part of the reason to revert D59710,
so maybe we'll extend the table-based approach to other subtargets.
Differential Revision: https://reviews.llvm.org/D70607
Diffstat (limited to 'llvm/test/Transforms/LoopVectorize/X86')
| -rw-r--r-- | llvm/test/Transforms/LoopVectorize/X86/interleaving.ll | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll b/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll index 9294c92b575..f12f3570215 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/interleaving.ll @@ -1,6 +1,6 @@ ; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine < %s | FileCheck %s --check-prefix=NORMAL -; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=slm < %s | FileCheck %s --check-prefix=NORMAL -; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=atom < %s | FileCheck %s --check-prefix=ATOM +; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=slm < %s | FileCheck %s --check-prefix=SLOW +; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=atom < %s | FileCheck %s --check-prefix=SLOW ; NORMAL-LABEL: foo ; NORMAL: %[[WIDE:.*]] = load <8 x i32>, <8 x i32>* %{{.*}}, align 4 @@ -8,10 +8,10 @@ ; NORMAL: %[[STRIDED2:.*]] = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7> ; NORMAL: add nsw <4 x i32> %[[STRIDED2]], %[[STRIDED1]] -; ATOM-LABEL: foo -; ATOM: load i32 -; ATOM: load i32 -; ATOM: store i32 +; SLOW-LABEL: foo +; SLOW: load i32 +; SLOW: load i32 +; SLOW: store i32 define void @foo(i32* noalias nocapture %a, i32* noalias nocapture readonly %b) { entry: br label %for.body |

