summaryrefslogtreecommitdiffstats
path: root/llvm/test/Analysis/CostModel
Commit message (Collapse)AuthorAgeFilesLines
...
* [SystemZ] Improve handling and cost estimates of vector integer div/remJonas Paulsson2018-10-256-355/+974
| | | | | | | | | | | | | | | Enable the DAG optimization that converts vector div/rem with constants into multiply+shifts sequences by expanding them early. This is needed since ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be available to BuildSDIV after legalization. Better cost values for these instructions based on how they will be implemented (a constant divisor is cheaper). Review: Ulrich Weigand https://reviews.llvm.org/D53196 llvm-svn: 345321
* [CostModel][X86] Add realistic vXi64 uitofp vXf64 costsSimon Pilgrim2018-10-251-14/+14
| | | | | | Match codegen improvements from D53649/rL345256 llvm-svn: 345263
* [CostModel][X86] Add realistic i64 uitofp f64 scalar costsSimon Pilgrim2018-10-251-3/+3
| | | | llvm-svn: 345261
* [CostModel][X86] Add vXi8 vector division by constants costs.Simon Pilgrim2018-10-243-193/+193
| | | | | | ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV. llvm-svn: 345175
* [CostModel][X86] Enable non-uniform vector division by constants costs.Simon Pilgrim2018-10-243-102/+598
| | | | | | Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases. llvm-svn: 345164
* [TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no ↵Simon Pilgrim2018-10-231-61/+185
| | | | | | difference in lowering. llvm-svn: 345048
* [CostModel][X86] Add transpose shuffle cost testsSimon Pilgrim2018-10-231-0/+164
| | | | llvm-svn: 345045
* Add BROADCAST shuffle cost tests.Simon Pilgrim2018-10-231-0/+35
| | | | | | Part of a lot of cleanup necessary before PR39368. llvm-svn: 345025
* Add BROADCAST shuffle cost tests.Simon Pilgrim2018-10-231-0/+33
| | | | | | Part of a lot of cleanup necessary before PR39368. llvm-svn: 345023
* [ARM] Regenerate reverse shuffle costsSimon Pilgrim2018-10-221-19/+17
| | | | | | Came about while cleaning up general shuffle costs for PR39368 llvm-svn: 344966
* [CostModel][X86] Add some initial extract/insert subvector shuffle cost testsSimon Pilgrim2018-10-202-0/+252
| | | | | | Just f64/i64 tests initially to demonstrate PR39368 llvm-svn: 344857
* [CostModel][X86] Add integer vector reduction cost testsSimon Pilgrim2018-10-209-0/+2561
| | | | llvm-svn: 344846
* [SystemZ] Temporarily disable high VFs with integer div/rem.Jonas Paulsson2018-10-101-32/+35
| | | | | | | | Until mischeduler is clever enough to avoid spilling in a vectorized loop with many (scalar) DLRs it is better to avoid high vectorization factors (8 and above). llvm-svn: 344129
* [SystemZ] Take better care when computing needed vector registers in TTI.Jonas Paulsson2018-10-102-0/+21
| | | | | | | | | | | | | | | | | | A new function getNumVectorRegs() is better to use for the number of needed vector registers instead of getNumberOfParts(). This is to make sure that the number of vector registers (and typically operations) required for a vector type is accurate. getNumberOfParts() which was previously used works by splitting the vector type until it is legal gives incorrect results for types with a non power of two number of elements (rare). A new static function getScalarSizeInBits() that also checks for a pointer type and returns 64U for it since otherwise it gets a value of 0). Used in a few places where Ty may be pointer. Review: Ulrich Weigand llvm-svn: 344115
* [SystemZ] Adjust cost functions for subtargets that use LI + LOC instead of IPMJonas Paulsson2018-09-143-84/+606
| | | | | | | | | | | | | | | After recent improvements which makes better use of LOC instead of IPM, the TTI cost functions also needs to be updated to reflect this. This involves sext, zext and xor of i1. The tests were updated so that for z13 the new costs are expected, while the old costs are still checked for on zEC12. Review: Ulrich Weigand https://reviews.llvm.org/D51339 llvm-svn: 342207
* [X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F.Craig Topper2018-08-261-2/+2
| | | | | | | | | | | | | | Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51267 llvm-svn: 340708
* [TargetTransformInfo] Add pow2 analysis for scalar constantsSimon Pilgrim2018-07-112-144/+144
| | | | | | Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs. llvm-svn: 336827
* [CostModel][X86] Add SREM/UREM general and constant costs (PR38056)Simon Pilgrim2018-07-071-216/+748
| | | | | | | | | | We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM. This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975). Differential Revision: https://reviews.llvm.org/D48980 llvm-svn: 336486
* [CostModel][X86] Add UDIV/UREM by pow2 costsSimon Pilgrim2018-07-052-174/+421
| | | | | | Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc. llvm-svn: 336371
* [CostModel][X86] Add cost tests for fp rounding intrinsicsSimon Pilgrim2018-07-021-0/+519
| | | | | | Add cost tests for fp ceil, floor, nearbyint, rint and trunc. llvm-svn: 336122
* [AArch64] Add custom lowering for v4i8 trunc storeAdhemerval Zanella2018-06-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int *src, int width, unsigned char *dst) { for (int i = 0; i < width; i++) *dst++ = *src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735
* [CostModel][AArch64] Add some initial costs for SK_Select and ↵Simon Pilgrim2018-06-221-6/+6
| | | | | | | | | | | | | | SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329
* [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)Simon Pilgrim2018-06-213-61/+47
| | | | | | These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216
* [CostModel][AArch64] Add cost tests for ALTERNATE/SELECT style shuffle masksSimon Pilgrim2018-06-141-0/+95
| | | | | | Precursor to fixing a regression with SLP vectorizer for supporting SELECT shuffles (vs the current ALTERNATE) llvm-svn: 334714
* [CostModel] Recognise REVERSE shuffle mask if the elements come from the ↵Simon Pilgrim2018-06-141-7/+7
| | | | | | second src llvm-svn: 334698
* [CostModel][X86] Test showing failure to recognise REVERSE shuffle mask if ↵Simon Pilgrim2018-06-131-0/+47
| | | | | | the elements come from the second src llvm-svn: 334623
* [CostModel] Recognise BROADCAST shuffle mask if the elements come from the ↵Simon Pilgrim2018-06-131-7/+7
| | | | | | second src llvm-svn: 334620
* [CostModel][X86] Test showing failure to recognise BROADCAST shuffle mask if ↵Simon Pilgrim2018-06-131-0/+47
| | | | | | the elements come from the second src llvm-svn: 334616
* [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select ↵Simon Pilgrim2018-06-121-128/+96
| | | | | | | | | | | | | | | | | | (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513
* [CostModel] Treat Identity shuffle masks as zero costSimon Pilgrim2018-06-122-92/+56
| | | | | | | | | | As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all match - but we could probably accept all cases. Differential Revision: https://reviews.llvm.org/D47986 llvm-svn: 334506
* [CostModel][X86] Add extra Identity shuffle mask cost tests (D47986)Simon Pilgrim2018-06-121-0/+59
| | | | llvm-svn: 334486
* [CostModel][X86] Add 'select' style shuffle costs tests (PR33744)Simon Pilgrim2018-06-091-2/+324
| | | | llvm-svn: 334351
* [X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets.Simon Pilgrim2018-06-051-4/+4
| | | | | | | | | | | | Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support. Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts. This is a step towards adding support for vXi16 vector rotates. Differential Revision: https://reviews.llvm.org/D47546 llvm-svn: 334023
* [TTI] Add uniform/non-uniform constant Pow2 detection to ↵Simon Pilgrim2018-05-221-158/+253
| | | | | | | | | | | | | | | | TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969
* [AArch64] Improve cost of vector division by constantAdhemerval Zanella2018-05-091-0/+45
| | | | | | | | | | | | | With custom lowering for vector MULLH{S,U}, it is now profitable to vectorize a divide by constant loop for the custom types (v16i8, v8i16, and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which uses a multiply by constant plus adjustment to express a divide by constant. Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by Instruction::AShr. llvm-svn: 331873
* [CostModel][X86] Split off SLM checksSimon Pilgrim2018-05-091-1/+77
| | | | | | A future patch will require this and the diff is much better if we perform the split separately. llvm-svn: 331867
* [TTI, AArch64] Add transpose shuffle kindMatthew Simpson2018-04-261-20/+20
| | | | | | | | | | | | | | This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost. Differential Revision: https://reviews.llvm.org/D45982 llvm-svn: 330941
* [CostModel][X86] Add div/rem tests for non-uniform constant divisorsSimon Pilgrim2018-04-252-0/+496
| | | | llvm-svn: 330852
* [AArch64] Add cost model test case for transposeMatthew Simpson2018-04-231-0/+182
| | | | | | | This patch adds a cost model test case for vector shuffles having transpose masks. The given costs are inaccurate and will be updated in a follow-on patch. llvm-svn: 330625
* [CostModel][X86] Add vector element insert/extract cost testsSimon Pilgrim2018-04-202-0/+718
| | | | llvm-svn: 330439
* [CostModel][X86] Add srem/urem constant cost testsSimon Pilgrim2018-04-201-0/+248
| | | | llvm-svn: 330436
* [CostModel][X86] Add SLM/GLM/BtVer2 compare + division/remainder cost testsSimon Pilgrim2018-04-203-0/+194
| | | | llvm-svn: 330435
* [CostModel][X86] Split off BtVer2 cost checksSimon Pilgrim2018-04-2015-101/+1454
| | | | llvm-svn: 330433
* [CostModel][X86] Add GoldmontPlus cost testsSimon Pilgrim2018-04-201-0/+1
| | | | | | Just reuses goldmont costs atm llvm-svn: 330432
* [CostModel][X86] Add some specific cpu targets to the cost modelsSimon Pilgrim2018-04-1315-313/+371
| | | | | | We're mostly testing with generic isa attributes, but PR36550 will require testing of specific target's scheduler models as well. llvm-svn: 330056
* [CostModel][X86] Split fma arith costs tests from other fp testsSimon Pilgrim2018-04-132-10/+73
| | | | | | Was proving cumbersome to test with/without fma llvm-svn: 330054
* [CostModel][X86] Regenerate latency/codesize cost testsSimon Pilgrim2018-04-131-33/+27
| | | | llvm-svn: 330052
* [CostModel][X86] Regenerate cast conversion cost testsSimon Pilgrim2018-04-131-146/+378
| | | | llvm-svn: 330051
* [CostModel][X86] Regenerate masked intrinsic cost testsSimon Pilgrim2018-04-131-160/+429
| | | | llvm-svn: 330050
* [CostModel][X86] Regenerate vector reduction cost tests with ↵Simon Pilgrim2018-04-071-104/+987
| | | | | | | update_analyze_test_checks.py NOTE: We're only really interested in the extractelement cost (which represents the entire reduction). llvm-svn: 329504
OpenPOWER on IntegriCloud