| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable the DAG optimization that converts vector div/rem with constants into
multiply+shifts sequences by expanding them early. This is needed since
ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be
available to BuildSDIV after legalization.
Better cost values for these instructions based on how they will be
implemented (a constant divisor is cheaper).
Review: Ulrich Weigand
https://reviews.llvm.org/D53196
llvm-svn: 345321
|
|
|
|
|
|
| |
Match codegen improvements from D53649/rL345256
llvm-svn: 345263
|
|
|
|
| |
llvm-svn: 345261
|
|
|
|
|
|
| |
ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV.
llvm-svn: 345175
|
|
|
|
|
|
| |
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.
llvm-svn: 345164
|
|
|
|
|
|
| |
difference in lowering.
llvm-svn: 345048
|
|
|
|
| |
llvm-svn: 345045
|
|
|
|
|
|
| |
Part of a lot of cleanup necessary before PR39368.
llvm-svn: 345025
|
|
|
|
|
|
| |
Part of a lot of cleanup necessary before PR39368.
llvm-svn: 345023
|
|
|
|
|
|
| |
Came about while cleaning up general shuffle costs for PR39368
llvm-svn: 344966
|
|
|
|
|
|
| |
Just f64/i64 tests initially to demonstrate PR39368
llvm-svn: 344857
|
|
|
|
| |
llvm-svn: 344846
|
|
|
|
|
|
|
|
| |
Until mischeduler is clever enough to avoid spilling in a vectorized loop
with many (scalar) DLRs it is better to avoid high vectorization factors (8
and above).
llvm-svn: 344129
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A new function getNumVectorRegs() is better to use for the number of needed
vector registers instead of getNumberOfParts(). This is to make sure that the
number of vector registers (and typically operations) required for a vector
type is accurate.
getNumberOfParts() which was previously used works by splitting the vector
type until it is legal gives incorrect results for types with a non
power of two number of elements (rare).
A new static function getScalarSizeInBits() that also checks for a pointer
type and returns 64U for it since otherwise it gets a value of 0). Used in a
few places where Ty may be pointer.
Review: Ulrich Weigand
llvm-svn: 344115
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After recent improvements which makes better use of LOC instead of IPM, the
TTI cost functions also needs to be updated to reflect this.
This involves sext, zext and xor of i1.
The tests were updated so that for z13 the new costs are expected, while the
old costs are still checked for on zEC12.
Review: Ulrich Weigand
https://reviews.llvm.org/D51339
llvm-svn: 342207
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51267
llvm-svn: 340708
|
|
|
|
|
|
| |
Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs.
llvm-svn: 336827
|
|
|
|
|
|
|
|
|
|
| |
We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM.
This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975).
Differential Revision: https://reviews.llvm.org/D48980
llvm-svn: 336486
|
|
|
|
|
|
| |
Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc.
llvm-svn: 336371
|
|
|
|
|
|
| |
Add cost tests for fp ceil, floor, nearbyint, rint and trunc.
llvm-svn: 336122
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.
A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:
define void @foo(<4 x i16> %x, i8* nocapture %p) {
%0 = trunc <4 x i16> %x to <4 x i8>
%1 = bitcast i8* %p to <4 x i8>*
store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
ret void
}
Can be optimized from:
umov w8, v0.h[3]
umov w9, v0.h[2]
umov w10, v0.h[1]
umov w11, v0.h[0]
strb w8, [x0, #3]
strb w9, [x0, #2]
strb w10, [x0, #1]
strb w11, [x0]
ret
To:
xtn v0.8b, v0.8h
str s0, [x0]
ret
The patch also adjust the memory cost for autovectorization, so the C
code:
void foo (const int *src, int width, unsigned char *dst)
{
for (int i = 0; i < width; i++)
*dst++ = *src++;
}
can be vectorized to:
.LBB0_4: // %vector.body
// =>This Inner Loop Header: Depth=1
ldr q0, [x0], #16
subs x12, x12, #4 // =4
xtn v0.4h, v0.4s
xtn v0.8b, v0.8h
st1 { v0.s }[0], [x2], #4
b.ne .LBB0_4
Instead of byte operations.
llvm-svn: 335735
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.
This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.
I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.
Differential Revision: https://reviews.llvm.org/D48172
llvm-svn: 335329
|
|
|
|
|
|
| |
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does.
llvm-svn: 335216
|
|
|
|
|
|
| |
Precursor to fixing a regression with SLP vectorizer for supporting SELECT shuffles (vs the current ALTERNATE)
llvm-svn: 334714
|
|
|
|
|
|
| |
second src
llvm-svn: 334698
|
|
|
|
|
|
| |
the elements come from the second src
llvm-svn: 334623
|
|
|
|
|
|
| |
second src
llvm-svn: 334620
|
|
|
|
|
|
| |
the elements come from the second src
llvm-svn: 334616
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:
e.g. v4f32: <0,5,2,7> or <4,1,6,3>
This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:
e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.
This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.
Differential Revision: https://reviews.llvm.org/D47985
llvm-svn: 334513
|
|
|
|
|
|
|
|
|
|
| |
As discussed on D47985, identity shuffle masks should probably be free.
I've limited this to the case where the input and output types all match - but we could probably accept all cases.
Differential Revision: https://reviews.llvm.org/D47986
llvm-svn: 334506
|
|
|
|
| |
llvm-svn: 334486
|
|
|
|
| |
llvm-svn: 334351
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support.
Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts.
This is a step towards adding support for vXi16 vector rotates.
Differential Revision: https://reviews.llvm.org/D47546
llvm-svn: 334023
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TargetTransformInfo::getInstructionThroughput
This enables us to detect more fast path sdiv cases under cost analysis.
This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.
Found while working on D46276
Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.
Differential Revision: https://reviews.llvm.org/D46637
llvm-svn: 332969
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With custom lowering for vector MULLH{S,U}, it is now profitable to
vectorize a divide by constant loop for the custom types (v16i8, v8i16,
and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which
uses a multiply by constant plus adjustment to express a divide by
constant.
Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by
Instruction::AShr.
llvm-svn: 331873
|
|
|
|
|
|
| |
A future patch will require this and the diff is much better if we perform the split separately.
llvm-svn: 331867
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.
Differential Revision: https://reviews.llvm.org/D45982
llvm-svn: 330941
|
|
|
|
| |
llvm-svn: 330852
|
|
|
|
|
|
|
| |
This patch adds a cost model test case for vector shuffles having transpose
masks. The given costs are inaccurate and will be updated in a follow-on patch.
llvm-svn: 330625
|
|
|
|
| |
llvm-svn: 330439
|
|
|
|
| |
llvm-svn: 330436
|
|
|
|
| |
llvm-svn: 330435
|
|
|
|
| |
llvm-svn: 330433
|
|
|
|
|
|
| |
Just reuses goldmont costs atm
llvm-svn: 330432
|
|
|
|
|
|
| |
We're mostly testing with generic isa attributes, but PR36550 will require testing of specific target's scheduler models as well.
llvm-svn: 330056
|
|
|
|
|
|
| |
Was proving cumbersome to test with/without fma
llvm-svn: 330054
|
|
|
|
| |
llvm-svn: 330052
|
|
|
|
| |
llvm-svn: 330051
|
|
|
|
| |
llvm-svn: 330050
|
|
|
|
|
|
|
| |
update_analyze_test_checks.py
NOTE: We're only really interested in the extractelement cost (which represents the entire reduction).
llvm-svn: 329504
|