summaryrefslogtreecommitdiffstats
path: root/llvm/test/Analysis/CostModel/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [IR] Match intrinsic parameter by scalar/vectorwidthSimon Pilgrim2019-01-231-0/+414
| | | | | | | | | | | | | | This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth. The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth. I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type. The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour Differential Revision: https://reviews.llvm.org/D57090 llvm-svn: 351957
* [CostModel][X86] Add ICMP Predicate specific costsSimon Pilgrim2019-01-221-1036/+1036
| | | | | | | | First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly. Differential Revision: https://reviews.llvm.org/D57013 llvm-svn: 351810
* [CostModel][X86] Add XOP icmp cost tests (PR40376)Simon Pilgrim2019-01-211-0/+462
| | | | llvm-svn: 351741
* [CostModel][X86] Add explicit vector select costsSimon Pilgrim2019-01-2011-689/+866
| | | | | | | | | | Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select. Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason). The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests. llvm-svn: 351685
* [CostModel][X86] Add explicit fcmp costs for pre-SSE42 targetsSimon Pilgrim2019-01-201-512/+512
| | | | | | Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era llvm-svn: 351684
* [CostModel][X86] Split icmp/fcmp costs tests and test all comparison codesSimon Pilgrim2019-01-203-330/+4529
| | | | llvm-svn: 351682
* [CostModel][X86] Add masked load/store/gather/scatter tests for ↵Simon Pilgrim2019-01-202-458/+660
| | | | | | SSE2/SSE42/AVX1 targets llvm-svn: 351681
* [CostModel][X86] Add non-constant vselect cost testsSimon Pilgrim2019-01-201-1/+121
| | | | | | Also add AVX512 costs at the same time llvm-svn: 351680
* Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"Nikita Popov2019-01-151-51/+70
| | | | | | | | | | | | | Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Reapplying with updated SLPVectorizer tests. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351219
* Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"Nikita Popov2019-01-141-70/+51
| | | | | | | | | This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129
* [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectorsNikita Popov2019-01-141-51/+70
| | | | | | | | | | | Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125
* [CostModel][X86] Fix SSE1 FADD/FSUB costsSimon Pilgrim2019-01-041-2/+2
| | | | | | | | Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403
* [CostModel][X86] Add SSE1 fp cost testsSimon Pilgrim2019-01-041-40/+184
| | | | llvm-svn: 350401
* [CostModel][X86] Add truncate cost tests to cover all legal destination typesSimon Pilgrim2019-01-031-9/+137
| | | | | | We were only testing costs for legal source vector element counts llvm-svn: 350323
* [X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)Simon Pilgrim2019-01-032-144/+524
| | | | | | Costs for real SSE2 instructions llvm-svn: 350295
* [X86] Add ADD/SUB SSAT/USAT cost tests (PR40123)Simon Pilgrim2019-01-032-0/+510
| | | | llvm-svn: 350293
* [CostModel][X86] Don't count 2 shuffles on the last level of a pairwise ↵Craig Topper2018-12-131-51/+31
| | | | | | | | | | | | arithmetic or min/max reduction This is split from D55452 with the correct patch this time. Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle. Differential Revision: https://reviews.llvm.org/D55615 llvm-svn: 349072
* [CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max ↵Craig Topper2018-12-108-646/+646
| | | | | | | | | | | | | | | | reduction. Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register? Reviewers: RKSimon, spatel, ABataev Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55480 llvm-svn: 348739
* [CostModel][X86] Fix overcounting arithmetic cost in illegal types in ↵Craig Topper2018-12-0719-1021/+965
| | | | | | | | | | | | | | getArithmeticReductionCost/getMinMaxReductionCost We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width. So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops. There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles. Differential Revision: https://reviews.llvm.org/D55397 llvm-svn: 348621
* [X86] Remove -costmodel-reduxcost=true from the experimental vector ↵Craig Topper2018-12-0518-144/+144
| | | | | | | | reduction intrinsic tests as it appears to be unnecessary. NFC I think this has something to do with matching reductions from extractelement, binops, and shuffles. But we're not matching here. llvm-svn: 348340
* [X86] Add more cost model tests for vector reductions with narrow vector ↵Craig Topper2018-12-0518-0/+531
| | | | | | types. NFC llvm-svn: 348339
* [TTI] Reduction costs only need to include a single extract element cost ↵Simon Pilgrim2018-12-0119-2028/+2028
| | | | | | | | | | | | | | | | (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076
* [X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in ↵Craig Topper2018-12-016-6/+6
| | | | | | interleave/strided load/store cost model tests. llvm-svn: 348056
* [X86] Make X86TTIImpl::getCastInstrCost properly handle the case where ↵Craig Topper2018-11-281-32/+15
| | | | | | | | | | | | AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786
* [X86] Add some cost model entries for sext/zext for avx512bwCraig Topper2018-11-282-22/+22
| | | | | | | | | | This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785
* [X86] Add a combine for back to back VSRAI instructionsCraig Topper2018-11-281-3/+3
| | | | | | | | Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI Differential Revision: https://reviews.llvm.org/D54959 llvm-svn: 347784
* [X86] Add test cases to show that we don't properly take ↵Craig Topper2018-11-281-0/+144
| | | | | | | | -mprefer-vector-width=256 and -min-legal-vector-width=256 into account when costing sext/zext. The check lines marked AVX256 in the zext256/sext256 functions should be closer to the AVX values which would take into account a splitting cost. llvm-svn: 347722
* [X86] Add exhaustive cost model testing for sext/zext for all vector types ↵Craig Topper2018-11-272-0/+958
| | | | | | | | | | we reasonably support. Add cost model tests for truncating to vXi1. Our sext/zext cost modeling was somewhat incomplete. And had no coverage for the fact that avx512bw v32i16/v64i8 types return a scalarization cost. Truncates are a whole different mess because isTruncateFree is returning true for vectors when it shouldn't and that's the fall back for anything not in the tables. llvm-svn: 347719
* [X86] Add cost model tests for experimental.vector.reduce.* with ↵Craig Topper2018-11-279-0/+2852
| | | | | | -x86-experimental-vector-widening-legalization llvm-svn: 347697
* [X86] Add cost model test for masked load an store with ↵Craig Topper2018-11-271-0/+606
| | | | | | -x86-experimental-vector-widening-legalization llvm-svn: 347696
* [X86] Add cost model tests for fp_to_int/int_to_fp with ↵Craig Topper2018-11-275-0/+1765
| | | | | | -x86-experimental-vector-widening-legalization llvm-svn: 347695
* [X86] Add cost model tests for shifts with ↵Craig Topper2018-11-273-0/+1589
| | | | | | -x86-experimental-vector-widening-legalization. llvm-svn: 347694
* Revert "[TTI] Reduction costs only need to include a single extract element ↵Fedor Sergeev2018-11-2610-1027/+1027
| | | | | | | | | | cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541
* [TTI] Reduction costs only need to include a single extract element costSimon Pilgrim2018-11-1510-1027/+1027
| | | | | | | | | | | | We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 346970
* [TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue Simon Pilgrim2018-11-142-340/+268
| | | | llvm-svn: 346868
* [CostModel] Add generic expansion funnel shift cost supportSimon Pilgrim2018-11-142-1274/+2930
| | | | | | | | Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854
* [CostModel][X86] Fix constant vector XOP rights shiftsSimon Pilgrim2018-11-132-87/+71
| | | | | | | | We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760
* [CostModel][X86] Add more cost tests for funnel shiftsSimon Pilgrim2018-11-132-8/+2853
| | | | | | Added full uniform/constant coverage for funnel shifts + rotates llvm-svn: 346754
* [CostModel][X86] Add funnel shift rotation special case costsSimon Pilgrim2018-11-122-84/+140
| | | | | | When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688
* [CostModel][X86] Add SHLD/SHRD scalar funnel shift costsSimon Pilgrim2018-11-122-320/+320
| | | | | | The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683
* [CostModel][X86] Add some initial cost tests for funnel shiftsSimon Pilgrim2018-11-122-0/+780
| | | | | | Still need to add full uniform/constant coverage but this is enough to check basic fshl/fshr cost handling llvm-svn: 346670
* [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is ↵Simon Pilgrim2018-11-121-22/+22
| | | | | | aligned within the source vector llvm-svn: 346664
* [CostModel] Add more realistic SK_InsertSubvector generic costs.Simon Pilgrim2018-11-121-8/+8
| | | | | | Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. llvm-svn: 346662
* [CostModel] Add more realistic SK_ExtractSubvector generic costs.Simon Pilgrim2018-11-121-28/+28
| | | | | | | | Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656
* [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the ↵Simon Pilgrim2018-11-0911-515/+563
| | | | | | start of the source vector llvm-svn: 346538
* [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput ↵Simon Pilgrim2018-11-091-36/+36
| | | | | | | | (PR39368) Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks. llvm-svn: 346510
* [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)Simon Pilgrim2018-10-3010-456/+763
| | | | | | | | | | | | | | | | Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617
* [X86] Add -LABEL to some FileCheck checks. NFCCraig Topper2018-10-263-240/+240
| | | | llvm-svn: 345407
* [CostModel][X86] Add realistic vXi64 uitofp vXf64 costsSimon Pilgrim2018-10-251-14/+14
| | | | | | Match codegen improvements from D53649/rL345256 llvm-svn: 345263
* [CostModel][X86] Add realistic i64 uitofp f64 scalar costsSimon Pilgrim2018-10-251-3/+3
| | | | llvm-svn: 345261
OpenPOWER on IntegriCloud