| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 284938
|
|
|
|
|
|
|
|
| |
bit integer vectors
We weren't checking for uniform const costs before the general cost, resulting in very high estimates.
llvm-svn: 284755
|
|
|
|
|
|
|
|
| |
uniform const power-of-2
Shows poor costings in AVX1/AVX512BW for certain vector types
llvm-svn: 284748
|
|
|
|
|
|
|
|
|
|
| |
integer vectors
We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults.
We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases.
llvm-svn: 284744
|
|
|
|
|
|
|
|
| |
bit integer vectors
Shows current bug in AVX1/AVX512BW costs for 256 bit vector types
llvm-svn: 284723
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.
This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.
It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.
Differential Revision: https://reviews.llvm.org/D23808
llvm-svn: 284459
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433
There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?
Differential Revision: https://reviews.llvm.org/D25165
llvm-svn: 283119
|
|
|
|
| |
llvm-svn: 283047
|
|
|
|
| |
llvm-svn: 283044
|
|
|
|
| |
llvm-svn: 283042
|
|
|
|
| |
llvm-svn: 281864
|
|
|
|
|
|
| |
There are more thorough tests found in vshift-*-cost.ll
llvm-svn: 279406
|
|
|
|
|
|
| |
add/sub/mul/and/or/xor tests
llvm-svn: 279405
|
|
|
|
| |
llvm-svn: 279404
|
|
|
|
| |
llvm-svn: 279403
|
|
|
|
| |
llvm-svn: 279402
|
|
|
|
|
|
| |
mul costs
llvm-svn: 279301
|
|
|
|
| |
llvm-svn: 279291
|
|
|
|
| |
llvm-svn: 279283
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.
Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.
Differential Revision: https://reviews.llvm.org/D23049
llvm-svn: 277782
|
|
|
|
| |
llvm-svn: 277718
|
|
|
|
| |
llvm-svn: 277716
|
|
|
|
|
|
|
|
| |
This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.
Differential Revision: https://reviews.llvm.org/D22456
llvm-svn: 276104
|
|
|
|
| |
llvm-svn: 275725
|
|
|
|
|
|
|
|
|
| |
Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).
Differential Revision: http://reviews.llvm.org/D22064
llvm-svn: 275106
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is "cvtdq2ps" which does not appear to be particularly slow on any CPU
according to Agner's tables. Choosing "5" as a cost here as suggested in:
https://llvm.org/bugs/show_bug.cgi?id=21356
...but it seems very conservative given that the instruction is fully pipelined,
and I think these costs are supposed to model throughput.
Note that related costs are also most likely too high, but this fixes PR21356
and partly fixes PR28434.
llvm-svn: 274658
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast gets split, the resulting casts
end up legal and cheap.
Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.
Differential Revision: http://reviews.llvm.org/D21251
llvm-svn: 274642
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 274043
|
|
|
|
|
|
| |
intrinsics" since some of the clang tests don't expect to see the updated signatures.
llvm-svn: 273895
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 273892
|
|
|
|
| |
llvm-svn: 273316
|
|
|
|
|
|
|
|
| |
The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization.
Differential Revision: http://reviews.llvm.org/D21521
llvm-svn: 273217
|
|
|
|
|
|
| |
To account for the fast PSHUFB implementation now available
llvm-svn: 272484
|
|
|
|
|
|
|
|
|
| |
The costs are somewhat hand-wavy, but should be much closer to the truth
than what we get from BasicTTI.
Differential Revision: http://reviews.llvm.org/D21156
llvm-svn: 272406
|
|
|
|
|
|
| |
Now that we have a nice fast VPPERM solution. Added framework for future intrinsic costs as well.
llvm-svn: 270537
|
|
|
|
| |
llvm-svn: 269770
|
|
|
|
| |
llvm-svn: 269594
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton.
This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42.
Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each.
Differential Revision: http://reviews.llvm.org/D20057
llvm-svn: 268972
|
|
|
|
|
|
| |
SSE2/SSE3/SSSE3/SSE41/SSE42 targets
llvm-svn: 268877
|
|
|
|
|
|
| |
count' cost tests
llvm-svn: 268859
|
|
|
|
|
|
| |
not SSE3/SSSE3 etc.
llvm-svn: 268763
|
|
|
|
| |
llvm-svn: 268761
|
|
|
|
|
|
| |
ctpop/ctlz/cttz/bitreverse/bswap
llvm-svn: 268738
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS
operations during DAG combine. This generates better code for truncate, so cost
of truncate needs to be changed but looks like it got changed only in SSE2 table
Whereas this change is also applicable for SSE4.1, so the cost of truncate needs
to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE
v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from
SSE4.1, so it will fall back to SSE2.
Reviewers: Simon Pilgrim
llvm-svn: 267123
|
|
|
|
|
|
|
|
| |
This reverts commit r266086.
It breaks the LTO build of gcc in SPEC2000.
llvm-svn: 266282
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a resubmittion of 263158 change.
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 266086
|
|
|
|
|
|
|
|
|
| |
PPC has a vector popcount, this lets the vectorizer use the correct cost
for it. Tweak X86 test to use an intrinsic that's actually scalarized (we
have a somewhat efficient lowering for vector popcount using SSE, the
cost model finds that now).
llvm-svn: 265005
|
|
|
|
|
|
|
|
|
|
|
| |
This commit broke LTO builds. Reverting it to unbreak the bots while the
issue is investigated. See also:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html
This reverts r263158
llvm-svn: 264088
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 263158
|
|
|
|
|
|
|
|
|
| |
The cost is calculated for all X86 targets. When gather/scatter instruction
is not supported we calculate the cost of scalar sequence.
Differential revision: http://reviews.llvm.org/D15677
llvm-svn: 256519
|