summaryrefslogtreecommitdiffstats
path: root/llvm/test/Analysis/CostModel/X86
Commit message (Collapse)AuthorAgeFilesLines
* [CostModel][X86] Add missing scalar i64->f32 uitofp costsSimon Pilgrim2020-01-061-11/+11
|
* [x86] add cost model special-case for insert/extract from element 0Sanjay Patel2019-12-064-62/+62
| | | | | | | | | | | | | | | | | This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023
* [x86] make SLM extract vector element more expensive than defaultSanjay Patel2019-11-274-255/+1197
| | | | | | | | | | | | | | | | | | | I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
* [X86] Remove setOperationAction for FP_TO_SINT v8i16.Craig Topper2019-11-121-12/+12
| | | | | | | | This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.
* [CostModel] Fixed isExtractSubvectorMask for undef index off endTim Renouf2019-11-081-0/+5
| | | | | | | | | | | | | | | | | | | ShuffleVectorInst::isExtractSubvectorMask, introduced in [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) erroneously thought that %340 = shufflevector <4 x float> %339, <4 x float> undef, <3 x i32> <i32 2, i32 3, i32 undef> is a subvector extract, even though it goes off the end of the parent vector with the undef index. That then caused an assert in BasicTTIImplBase::getExtractSubvectorOverhead. This commit fixes that, by not considering the above a subvector extract. Differential Revision: https://reviews.llvm.org/D70005 Change-Id: I87b8b00b24bef19ffc9a1b82ef4eca3b8a246eaf
* [CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLMSimon Pilgrim2019-11-062-12/+12
| | | | As noted on D59710 we weren't handling the high costs of these operations on SLM.
* [CostModel][X86] Add add/fadd reduction tests for SLMSimon Pilgrim2019-11-062-0/+256
|
* [X86] Lower the cost of avx512 horizontal bool and/or reductions to ↵Craig Topper2019-11-042-42/+42
| | | | | | | | | | | | | | | | | 2*log2(bitwidth)+1 for legal types. This better represents the kshift+binop we'd get for each stage before the final extract. Its likely we'll do even better by doing a kmov and a cmp with a GPR, but this is a good start. The default handling was costing a worst case single source permute shuffle of the vector before the binop. This worst case assumes the shuffle might have to be emulated with extracts and inserts. But since we know we're doing a reduction we can assume we'll get kshift lowering. There's still some room for improvement here, but this is much better than it was.
* [CostModel][X86] Add CTLZ scalar costsSimon Pilgrim2019-10-141-31/+64
| | | | | | | | Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786
* [CostModel][X86] Add CTPOP scalar costs (PR43656)Simon Pilgrim2019-10-141-4/+4
| | | | | | | | Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775
* [CostModel][X86] Improve sum reduction costs.Simon Pilgrim2019-10-122-336/+141
| | | | | | | | I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. llvm-svn: 374655
* [CostModel][X86] Add tests for insertelement to non-immediate vector element ↵Simon Pilgrim2019-10-091-0/+74
| | | | | | indices llvm-svn: 374161
* [CostModel][X86] Add tests for extractelement from non-immediate vector ↵Simon Pilgrim2019-10-091-0/+74
| | | | | | element indices llvm-svn: 374160
* [CostModel][X86] Fix SLM <2 x i64> icmp costsSimon Pilgrim2019-09-266-55/+285
| | | | | | | | SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs. This should remove some of the SLM codegen diffs in D43582 llvm-svn: 372954
* [Cost][X86] Add more missing vector truncation costsSimon Pilgrim2019-09-221-49/+49
| | | | | | The AVX512 cases still need some work to correct recognise the PMOV truncation cases. llvm-svn: 372514
* [Cost][X86] Add v2i64 truncation costsSimon Pilgrim2019-09-224-105/+105
| | | | | | | | We are missing costs for a lot of truncation cases, I'm hoping to address all the 'zero cost' cases in trunc.ll I thought this was a vector widening side effect, but even before this we had some interesting LV decisions (notably over indvars) being made due to these zero costs. llvm-svn: 372498
* [CostModel][X86] Add scalar sext/zext cost testsSimon Pilgrim2019-09-021-0/+158
| | | | llvm-svn: 370684
* [CostModel] Model all `extractvalue`s as free.Roman Lebedev2019-08-291-24/+24
| | | | | | | | | | | | | | | | | | | Summary: As disscussed in https://reviews.llvm.org/D65148#1606412, `extractvalue` don't actually generate any code, so we should treat them as free. Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen Reviewed By: jmolloy Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66098 llvm-svn: 370339
* [X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening ↵Craig Topper2019-08-222-8/+8
| | | | | | | | | | | | | | legalization. I don't really understand the costs we're using for fp_to_sint, but prior to widening legalization we used 20 as the cost for this via the v2i64->v2f64 entry. That number seems better than the 40 we got with widening legalization. So now we need either a v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether AVX is enabled or not since we skip the first SSE2 table look up under AVX. llvm-svn: 369628
* [X86] Improve cost model for subvector extraction of less than 128-bit vectorsCraig Topper2019-08-151-615/+853
| | | | | | | | Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements. Differential Revision: https://reviews.llvm.org/D65892 llvm-svn: 369022
* [X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than ↵Craig Topper2019-08-143-58/+58
| | | | | | | | | | | | 128-bit inputs Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG. For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high. Differential Revision: https://reviews.llvm.org/D66169 llvm-svn: 368858
* [X86] Add missing regular 512-bit vXi8 extract subvector cost model testsSimon Pilgrim2019-08-141-73/+421
| | | | | | These tests don't cover many cases where the subvectors don't start on aligned indices, but that can be added later. llvm-svn: 368839
* [X86] Add some vXi8 extract subvector cost model testsSimon Pilgrim2019-08-131-0/+367
| | | | | | We don't have full 512-bit test coverage yet - but there's enough to help test D65892 llvm-svn: 368716
* [CostModel][X86][AArch64] Check all 3 cost kinds in aggregates.llRoman Lebedev2019-08-121-25/+91
| | | | llvm-svn: 368595
* [CostModel][X86][AArch64] Add some tests for extractvalueRoman Lebedev2019-08-121-0/+76
| | | | | | | In https://reviews.llvm.org/D65148 it is suggested that it should have zero cost, always. llvm-svn: 368548
* Recommit r368081 "[X86] Add more extract subvector cost model tests for ↵Craig Topper2019-08-071-7/+488
| | | | | | smaller element sizes and smaller than 128-bit vectors." llvm-svn: 368185
* Recommit r367901 "[X86] Enable ↵Craig Topper2019-08-0723-828/+470
| | | | | | | | | | | | | | | | | | | | | | | | | | | -x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183
* Revert "[X86] Add more extract subvector cost model tests for smaller ↵Mitch Phillips2019-08-061-488/+7
| | | | | | | | | | | element sizes and smaller than 128-bit vectors." This reverts commit fc33e33776b7a7ce22e539f0ec2e3bfdb09ad361. This commit depends on the rolled back commit rL367901, and thus needs to be rolled back. llvm-svn: 368109
* Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."Mitch Phillips2019-08-0623-470/+828
| | | | | | | | | This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107
* [X86] Add more extract subvector cost model tests for smaller element sizes ↵Craig Topper2019-08-061-7/+488
| | | | | | | | | and smaller than 128-bit vectors. With the switch to widening legalization, we need to a better job of costing extractions of less than 128-bits. llvm-svn: 368081
* [X86] Remove tests for -x86-experimental-vector-widening-legalization from ↵Craig Topper2019-08-0618-8109/+0
| | | | | | | | | test/Analysis/CostModel/X86/ This flag is now the default behavior so we don't need separate tests. llvm-svn: 368080
* [X86] Enable -x86-experimental-vector-widening-legalization by default.Craig Topper2019-08-0523-536/+485
| | | | | | | | | | | | | | | | | | | | | This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901
* [lit] Delete empty lines at the end of lit.local.cfg NFCFangrui Song2019-06-171-1/+0
| | | | llvm-svn: 363538
* Improve reduction intrinsics by overloading result value.Sander de Smalen2019-06-1318-4057/+4057
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses the mechanism from D62995 to strengthen the definitions of the reduction intrinsics by letting the scalar result/accumulator type be overloaded from the vector element type. For example: ; The LLVM LangRef specifies that the scalar result must equal the ; vector element type, but this is not checked/enforced by LLVM. declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) This patch changes that into: declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a) Which has the type-constraint more explicit and causes LLVM to check the result type with the vector element type. Reviewers: RKSimon, arsenm, rnk, greened, aemerson Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62996 llvm-svn: 363240
* [CostModel][X86] Improve masked load/store AVX1/AVX2 costsSimon Pilgrim2019-06-022-76/+76
| | | | | | | | | | | | | | | | | | | | A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338
* [CostModel][X86] Add bool vector and/or/xor cost testsSimon Pilgrim2019-05-301-0/+192
| | | | llvm-svn: 362083
* [CostModel] Add really basic support for being able to query the cost of the ↵Craig Topper2019-05-281-34/+78
| | | | | | | | | | | | | | | | | | | | | | | FNeg instruction. Summary: This reuses the getArithmeticInstrCost, but passes dummy values of the second operand flags. The X86 costs are wrong and can be improved in a follow up. I just wanted to stop it from reporting an unknown cost first. Reviewers: RKSimon, spatel, andrew.w.kaylor, cameron.mcinally Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62444 llvm-svn: 361788
* [X86] Add test cases for D62444. NFCCraig Topper2019-05-271-0/+171
| | | | llvm-svn: 361745
* [CostModel][X86] Add min/max reduction costs for all SSE targetsSimon Pilgrim2019-05-118-559/+559
| | | | | | | | The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528
* [CostModel][X86] Add bool anyof/allof reduction costsSimon Pilgrim2019-04-174-184/+96
| | | | | | | | On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest...... Differential Revision: https://reviews.llvm.org/D60403 llvm-svn: 358574
* [CostModel][X86] Add more exhaustive masked ↵Simon Pilgrim2019-04-062-72/+2050
| | | | | | load/store/gather/scatter/expand/compress cost tests llvm-svn: 357838
* [TTI] Add generic cost model for smul/umul overflow intrinsicsSimon Pilgrim2019-02-251-36/+378
| | | | | | Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO. llvm-svn: 354784
* [TTI] Add generic cost model for fixed point smul/umulSimon Pilgrim2019-02-251-36/+378
| | | | | | | | Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline. Differential Revision: https://reviews.llvm.org/D57925 llvm-svn: 354774
* [CostModel][X86] Add UMUL fixed point cost tests Simon Pilgrim2019-02-051-0/+63
| | | | llvm-svn: 353153
* [CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADDNikita Popov2019-01-281-4/+4
| | | | | | | | | Followup to D56636, this time handling the UADDSAT case by expanding uadd.sat(a, b) to umin(a, ~b) + b. Differential Revision: https://reviews.llvm.org/D56869 llvm-svn: 352409
* [TTI] Add generic SADDSAT/SSUBSAT costsSimon Pilgrim2019-01-271-196/+234
| | | | | | | | | | Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow. This completes PR40316 Differential Revision: https://reviews.llvm.org/D57239 llvm-svn: 352315
* [CostModel][X86] Add SMUL fixed point cost tests Simon Pilgrim2019-01-241-0/+75
| | | | llvm-svn: 352046
* [TTI] Add generic SADDO/SSUBO costsSimon Pilgrim2019-01-241-36/+378
| | | | | | Added x86 scalar sadd_with_overflow/ssub_with_overflow costs. llvm-svn: 352045
* [TTI] Add generic UADDSAT/USUBSAT costsSimon Pilgrim2019-01-241-162/+181
| | | | | | | | Add generic costs calculation for UADDSAT/USUBSAT intrinsics, this fallbacks to using generic costs for uadd_with_overflow/usub_with_overflow + a select. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352044
* [TTI] Add generic UADDO/USUBO costsSimon Pilgrim2019-01-241-36/+378
| | | | | | | | Added x86 scalar uadd_with_overflow/usub_with_overflow costs. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352043
OpenPOWER on IntegriCloud