| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 310645
|
|
|
|
|
|
| |
Cover most 128/256/512/1024-bit cases for vXf64/vXi64, vXf32/vXi32, vXi16 + vXi8
llvm-svn: 310641
|
|
|
|
| |
llvm-svn: 310633
|
|
|
|
|
|
| |
Add missing SK_PermuteSingleSrc costs for AVX2 targets and earlier, also added some of the simpler SK_PermuteTwoSrc costs to support splitting of SK_PermuteSingleSrc shuffles
llvm-svn: 310632
|
|
|
|
|
|
| |
Fixed label checks for all prefixes
llvm-svn: 310606
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the new 32-bit vector float instructions of z14.
This includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions, including new LLVM intrinsics.
- Scheduler description support for the instructions.
- Update to the vector cost function calculations.
In general, CodeGen support for the new v4f32 instructions closely
matches support for the existing v2f64 instructions.
llvm-svn: 308195
|
|
|
|
|
|
|
|
|
| |
this patch updates the cost of addq\subq (add\subtract of vectors of 64bits)
based on the performance numbers of SLM arch.
Differential Revision: https://reviews.llvm.org/D33983
llvm-svn: 306974
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cost of an interleaved access was only implemented for AVX512. For other
X86 targets an overly conservative Base cost was returned, resulting in
avoiding vectorization where it is actually profitable to vectorize.
This patch starts to add costs for AVX2 for most prominent cases of
interleaved accesses (stride 3,4 chars, for now).
Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb
workloads; There is also a known issue of 15-30% degradations on some of these
workloads, associated with an interleaved access followed by type
promotion/widening; the resulting shuffle sequence is currently inefficient and
will be improved by a series of patches that extend the X86InterleavedAccess pass
(such as D34601 and more to follow).
Note 2: The costs in this patch do not reflect port pressure penalties which can
be very dominant in the case of interleaved accesses since most of the shuffle
operations are restricted to a single port. Further tuning, that may incorporate
these considerations, will be done on top of the upcoming improved shuffle
sequences (that is, along with the abovementioned work to extend
X86InterleavedAccess pass).
Differential Revision: https://reviews.llvm.org/D34023
llvm-svn: 306238
|
|
|
|
| |
llvm-svn: 305810
|
|
|
|
|
|
| |
The alphabetical progression isn't that useful
llvm-svn: 305808
|
|
|
|
|
|
|
| |
The default vector insert/extract cost is more profitable on Falkor than the
reduced cost.
llvm-svn: 303771
|
|
|
|
| |
llvm-svn: 303448
|
|
|
|
| |
llvm-svn: 303342
|
|
|
|
| |
llvm-svn: 303300
|
|
|
|
| |
llvm-svn: 303293
|
|
|
|
|
|
| |
This will make things a lot easier to test all the permutations of avx512
llvm-svn: 303290
|
|
|
|
| |
llvm-svn: 303283
|
|
|
|
|
|
|
|
| |
Such divisions will eventually be implemented with shifts which should
be reflected in the cost function.
Review: Ulrich Weigand
llvm-svn: 303254
|
|
|
|
| |
llvm-svn: 303023
|
|
|
|
| |
llvm-svn: 303022
|
|
|
|
| |
llvm-svn: 303021
|
|
|
|
|
|
| |
sequences
llvm-svn: 303017
|
|
|
|
|
|
|
|
| |
mask.
Tweak cost model to match what lowering actually does.
llvm-svn: 303013
|
|
|
|
| |
llvm-svn: 303012
|
|
|
|
| |
llvm-svn: 303010
|
|
|
|
|
|
|
| |
VOP3P instructions can encode access to either
half of the register.
llvm-svn: 302730
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AArch64 instruction set has a few "widening" instructions (e.g., uaddl,
saddl, uaddw, etc.) that take one or more doubleword operands and produce
quadword results. The operands are automatically sign- or zero-extended as
appropriate. However, in LLVM IR, these extends are explicit. This patch
updates TTI to consider these widening instructions as single operations whose
cost is attached to the arithmetic instruction. It marks extends that are part
of a widening operation "free" and applies a sub-target specified overhead
(zero by default) to the arithmetic instructions.
Differential Revision: https://reviews.llvm.org/D32706
llvm-svn: 302582
|
|
|
|
|
|
| |
Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets.
llvm-svn: 302378
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes PR31789 - When loop-vectorize tries to use these intrinsics for a
non-default address space pointer we fail with a "Calling a function with a
bad singature!" assertion. This patch solves this by adding the 'vector of
pointers' argument as an overloaded type which will determine the address
space.
Differential revision: https://reviews.llvm.org/D31490
llvm-svn: 302018
|
|
|
|
| |
llvm-svn: 300078
|
|
|
|
|
|
| |
This did not get included in the previous commit for SystemZ cost functions.
llvm-svn: 300053
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
|
|
|
|
|
|
|
|
|
| |
Don't call getScalarizationOverhead(RetTy, true, false) if RetTy is void type.
Review: Hal Finkel
https://reviews.llvm.org/D31024
llvm-svn: 297954
|
|
|
|
|
|
| |
Prep work for PR31810
llvm-svn: 297876
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.
Tests updates:
Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll
The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.
Review: Hal Finkel
https://reviews.llvm.org/D29540
llvm-svn: 297705
|
|
|
|
|
|
|
|
|
|
|
|
| |
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.
This patch fixes pr31492.
Differential Revision: https://reviews.llvm.org/D28630
This is resubmit of r292680, which was reverted by r293092. The internal application failures were actually caused by a source code bug.
llvm-svn: 295506
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D29416
llvm-svn: 293932
|
|
|
|
| |
llvm-svn: 293793
|
|
|
|
|
|
|
|
|
|
| |
supports it"
This reverts commit r292680. It is causing significantly worse
performance and test timeouts in our internal builds. I have already
routed reproduction instructions your way.
llvm-svn: 293092
|
|
|
|
|
|
|
|
|
|
| |
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.
This patch fixes pr31492.
Differential Revision: https://reviews.llvm.org/D28630
llvm-svn: 292680
|
|
|
|
|
|
| |
We already have patterns in place to support 128/256-bit shifts without AVX512VL
llvm-svn: 292077
|
|
|
|
|
|
|
|
| |
costs
Keep the tests though.
llvm-svn: 292076
|
|
|
|
|
|
|
|
| |
non-constant uniform values.
Use shuffle( scslar_to_vector, zeroinitializer) pattern instead of shuffle( vec, zeroinitializer)
llvm-svn: 292075
|
|
|
|
|
|
| |
has landed
llvm-svn: 292023
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28447
llvm-svn: 291665
|
|
|
|
| |
llvm-svn: 291663
|
|
|
|
|
|
|
|
|
|
|
|
| |
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original code considered only v2i64 as slow for this feature. This patch
consider all 128-bit long vector types as slow candidates.
In internal tests, extending this feature to all 128-bit vector types
resulted in an overall improvement of 1% on Exynos M1.
Differential revision: https://reviews.llvm.org/D27998
llvm-svn: 291616
|
|
|
|
| |
llvm-svn: 291585
|