| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
this patch updates the cost of addq\subq (add\subtract of vectors of 64bits)
based on the performance numbers of SLM arch.
Differential Revision: https://reviews.llvm.org/D33983
llvm-svn: 306974
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cost of an interleaved access was only implemented for AVX512. For other
X86 targets an overly conservative Base cost was returned, resulting in
avoiding vectorization where it is actually profitable to vectorize.
This patch starts to add costs for AVX2 for most prominent cases of
interleaved accesses (stride 3,4 chars, for now).
Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb
workloads; There is also a known issue of 15-30% degradations on some of these
workloads, associated with an interleaved access followed by type
promotion/widening; the resulting shuffle sequence is currently inefficient and
will be improved by a series of patches that extend the X86InterleavedAccess pass
(such as D34601 and more to follow).
Note 2: The costs in this patch do not reflect port pressure penalties which can
be very dominant in the case of interleaved accesses since most of the shuffle
operations are restricted to a single port. Further tuning, that may incorporate
these considerations, will be done on top of the upcoming improved shuffle
sequences (that is, along with the abovementioned work to extend
X86InterleavedAccess pass).
Differential Revision: https://reviews.llvm.org/D34023
llvm-svn: 306238
|
|
|
|
|
|
|
|
|
| |
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)
llvm-svn: 305802
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.
> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 305720
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The patch makes instruction count the highest priority for
LSR solution for X86 (previously registers had highest priority).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D30562
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Expanding the loop idiom test for memcpy to also recognize
unordered atomic memcpy. The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
Patch by Daniel Neilson!
Reviewers: reames, anna, skatkov
Reviewed By: reames, anna
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33243
llvm-svn: 304806
|
|
|
|
| |
llvm-svn: 303342
|
|
|
|
| |
llvm-svn: 303300
|
|
|
|
| |
llvm-svn: 303293
|
|
|
|
| |
llvm-svn: 303283
|
|
|
|
| |
llvm-svn: 303023
|
|
|
|
| |
llvm-svn: 303022
|
|
|
|
| |
llvm-svn: 303021
|
|
|
|
|
|
| |
sequences
llvm-svn: 303017
|
|
|
|
|
|
|
|
| |
mask.
Tweak cost model to match what lowering actually does.
llvm-svn: 303013
|
|
|
|
| |
llvm-svn: 303012
|
|
|
|
| |
llvm-svn: 303010
|
|
|
|
|
|
| |
Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets.
llvm-svn: 302378
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
LSV wants to know the maximum size that can be loaded to a vector register.
On X86, this always matches the maximum register width. Implement this
accordingly and add a test to make sure that LSV can vectorize up to the
maximum permissible width on X86.
Reviewers: delena, arsenm
Reviewed By: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D31504
llvm-svn: 299589
|
|
|
|
|
|
| |
Prep work for PR31810
llvm-svn: 297876
|
|
|
|
| |
llvm-svn: 297824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.
Tests updates:
Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll
The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.
Review: Hal Finkel
https://reviews.llvm.org/D29540
llvm-svn: 297705
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D29416
llvm-svn: 293932
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactoring to remove duplications of this method.
New method getOperandsScalarizationOverhead() that looks at the present unique
operands and add extract costs for them. Old behaviour was to just add extract
costs for one operand of the type always, which still happens in
getArithmeticInstrCost() if no operands are provided by the caller.
This is a good start of improving on this, but there are more places
that can be improved by using getOperandsScalarizationOverhead().
Review: Hal Finkel
https://reviews.llvm.org/D29017
llvm-svn: 293155
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28547
llvm-svn: 293040
|
|
|
|
| |
llvm-svn: 292613
|
|
|
|
|
|
| |
SHL v8i32 is already handled in the SSE41 cost table
llvm-svn: 292612
|
|
|
|
|
|
| |
We already have patterns in place to support 128/256-bit shifts without AVX512VL
llvm-svn: 292077
|
|
|
|
|
|
| |
has landed
llvm-svn: 292023
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28447
llvm-svn: 291665
|
|
|
|
|
|
|
|
|
|
|
|
| |
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
|
|
|
|
|
|
|
|
|
|
| |
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).
Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.
Added missing AVX2/AVX512BW costs as well.
llvm-svn: 291391
|
|
|
|
|
|
| |
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.
llvm-svn: 291390
|
|
|
|
|
|
| |
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.
llvm-svn: 291372
|
|
|
|
| |
llvm-svn: 291366
|
|
|
|
|
|
| |
We were matching against general vector shift costs before the uniform splat costs
llvm-svn: 291365
|
|
|
|
|
|
| |
conversion.
llvm-svn: 291364
|
|
|
|
| |
llvm-svn: 291355
|
|
|
|
| |
llvm-svn: 291354
|
|
|
|
|
|
| |
Allows us to correctly fall through to the lower AVX1 costs if look up failed.
llvm-svn: 291353
|
|
|
|
|
|
| |
order. NFCI.
llvm-svn: 291352
|
|
|
|
|
|
| |
v64i8 shuffles (PR31470)
llvm-svn: 291347
|
|
|
|
|
|
| |
Set the costs on the lowest target that supports the type.
llvm-svn: 291229
|
|
|
|
|
|
| |
Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention.
llvm-svn: 291187
|
|
|
|
| |
llvm-svn: 291165
|
|
|
|
| |
llvm-svn: 291163
|
|
|
|
|
|
| |
NFCI.
llvm-svn: 291162
|
|
|
|
|
|
| |
Removes need for yet another LUT.
llvm-svn: 291158
|
|
|
|
|
|
| |
Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead.
llvm-svn: 291152
|