summaryrefslogtreecommitdiffstats
path: root/llvm/test/Analysis/CostModel
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Enable -x86-experimental-vector-widening-legalization by default.Craig Topper2019-08-0523-536/+485
| | | | | | | | | | | | | | | | | | | | | This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901
* [SystemZ] Add support for new cpu architecture - arch13Ulrich Weigand2019-07-123-18/+212
| | | | | | | | | | | | | | | | | | This patch series adds support for the next-generation arch13 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Assembler/disassembler support for new instructions. - CodeGen for new instructions, including new LLVM intrinsics. - Scheduler description for the new processor. - Detection of arch13 as host processor. Note: No currently available Z system supports the arch13 architecture. Once new systems become available, the official system name will be added as supported -march name. llvm-svn: 365932
* Revert Recommit [PowerPC] Update P9 vector costs for insert/extract elementJordan Rupprecht2019-07-011-24/+24
| | | | | | | | This reverts r364557 (git commit 9f7f5858fe46b8e706e87a83e2fd0a2678be619e) This crashes as reported on the commit thread. Repro instructions TBD. llvm-svn: 364876
* Recommit [PowerPC] Update P9 vector costs for insert/extract elementRoland Froese2019-06-271-24/+24
| | | | | | Recommit patch D60160 after regression fix patch D63463. llvm-svn: 364557
* [lit] Delete empty lines at the end of lit.local.cfg NFCFangrui Song2019-06-174-4/+0
| | | | llvm-svn: 363538
* Improve reduction intrinsics by overloading result value.Sander de Smalen2019-06-1319-4144/+4144
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses the mechanism from D62995 to strengthen the definitions of the reduction intrinsics by letting the scalar result/accumulator type be overloaded from the vector element type. For example: ; The LLVM LangRef specifies that the scalar result must equal the ; vector element type, but this is not checked/enforced by LLVM. declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) This patch changes that into: declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a) Which has the type-constraint more explicit and causes LLVM to check the result type with the vector element type. Reviewers: RKSimon, arsenm, rnk, greened, aemerson Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62996 llvm-svn: 363240
* [ARM] Adjust isLegalT1AddressImmediate for non-legal typesDavid Green2019-06-081-17/+17
| | | | | | | | | | | Types such as float and i64's do not have legal loads in Thumb1, but will still be loaded with a LDR (or potentially multiple LDR's). As such we can treat the cost of addressing mode calculations the same as an i32 and get some optimisation benefits. Differential Revision: https://reviews.llvm.org/D62968 llvm-svn: 362874
* [ARM] Add MVE addressing to isLegalT2AddressImmediateDavid Green2019-06-081-30/+30
| | | | | | | | | | Now with MVE being added, we can add the vector addressing mode costs for it. These are generally imm7 multiplied by the size of the type being loaded / stored. Differential Revision: https://reviews.llvm.org/D62967 llvm-svn: 362873
* [ARM] Add fp16 addressing to isLegalT2AddressImmediateDavid Green2019-06-081-11/+11
| | | | | | | | | | The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs to account for those, and adds extra testing. It is dependant upon hasFPRegs16 as this is what the load/store instructions require. Differential Revision: https://reviews.llvm.org/D62966 llvm-svn: 362872
* [ARM] Add extra gep costmodel tests for MVE and half float. NFCDavid Green2019-06-081-73/+553
| | | | llvm-svn: 362871
* [RISCV] Disable test/Analysis/CostModel/RISCV tests if RISCV backend not builtLuis Marques2019-06-061-0/+3
| | | | | | Adds missing lit.local.cfg. Fixes rL362691. llvm-svn: 362693
* [RISCV] Add CostModel GEP testsLuis Marques2019-06-061-0/+189
| | | | | | Differential Revision: https://reviews.llvm.org/D61185 llvm-svn: 362691
* TTI: Improve default costs for addrspacecastMatt Arsenault2019-06-031-6/+27
| | | | | | | | | | For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436
* [CostModel][X86] Improve masked load/store AVX1/AVX2 costsSimon Pilgrim2019-06-022-76/+76
| | | | | | | | | | | | | | | | | | | | A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338
* [CostModel][X86] Add bool vector and/or/xor cost testsSimon Pilgrim2019-05-301-0/+192
| | | | llvm-svn: 362083
* [CostModel] Add really basic support for being able to query the cost of the ↵Craig Topper2019-05-281-34/+78
| | | | | | | | | | | | | | | | | | | | | | | FNeg instruction. Summary: This reuses the getArithmeticInstrCost, but passes dummy values of the second operand flags. The X86 costs are wrong and can be improved in a follow up. I just wanted to stop it from reporting an unknown cost first. Reviewers: RKSimon, spatel, andrew.w.kaylor, cameron.mcinally Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62444 llvm-svn: 361788
* [X86] Add test cases for D62444. NFCCraig Topper2019-05-271-0/+171
| | | | llvm-svn: 361745
* [CodeMetrics] Don't let extends of i1 be free.Jonas Paulsson2019-05-171-0/+53
| | | | | | | | | | | | | | | | | | getUserCost() currently returns TCC_Free for any extend of a compare (i1) result. It seems this is only true in a limited number of cases where for example two compares are chained. Even in those types of cases it seems unlikely that they are generally free, while they may be in some cases. This patch therefore removes this special handling of cast of i1. No tests are failing because of this. If some target want the old behavior, it could override getUserCost(). Review: Hal Finkel, Chandler Carruth, Evgeny Astigeevich, Simon Pilgrim, Ulrich Weigand https://reviews.llvm.org/D54742/new/ llvm-svn: 360970
* [CostModel][X86] Add min/max reduction costs for all SSE targetsSimon Pilgrim2019-05-118-559/+559
| | | | | | | | The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528
* Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract ↵David L. Jones2019-05-011-24/+24
| | | | | | | | element" This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313. llvm-svn: 359648
* [ARM] Implement TTI::getMemcpyCostSjoerd Meijer2019-04-301-4/+662
| | | | | | | | | This implements TargetTransformInfo method getMemcpyCost, which estimates the number of instructions to which a memcpy instruction expands to. Differential Revision: https://reviews.llvm.org/D59787 llvm-svn: 359547
* [PowerPC] Update P9 vector costs for insert/extract elementRoland Froese2019-04-261-24/+24
| | | | | | | | | | | The PPC vector cost model values for insert/extract element reflect older processors that lacked vector insert/extract and move-to/move-from VSR instructions. Update getVectorInstrCost to give appropriate values for when the newer instructions are present. Differential Revision: https://reviews.llvm.org/D60160 llvm-svn: 359313
* [ARM] Rewrite isLegalT2AddressImmediateDavid Green2019-04-211-65/+594
| | | | | | | | | | | | | | | | | | | | | This does two main things, firstly adding some at least basic addressing modes for i64 types, and secondly treats floats and doubles sensibly when there is no fpu. The floating point change can help codesize in some cases, especially with D60294. Most backends seems to not consider the exact VT in isLegalAddressingMode, instead switching on type size. That is now what this does when the target does not have an fpu (as the float data will be loaded using LDR's). i64's currently use the address range of an LDRD (even though they may be legalised and loaded with an LDR). This is at least better than marking them all as illegal addressing modes. I have not attempted to do much with vectors yet. That will need changing once MVE is added. Differential Revision: https://reviews.llvm.org/D60677 llvm-svn: 358845
* [PowerPC] Add some PPC vec cost tests to prep for D60160 NFCRoland Froese2019-04-182-16/+172
| | | | llvm-svn: 358699
* [CostModel][X86] Add bool anyof/allof reduction costsSimon Pilgrim2019-04-174-184/+96
| | | | | | | | On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest...... Differential Revision: https://reviews.llvm.org/D60403 llvm-svn: 358574
* [CostModel][X86] Add more exhaustive masked ↵Simon Pilgrim2019-04-062-72/+2050
| | | | | | load/store/gather/scatter/expand/compress cost tests llvm-svn: 357838
* [TTI] getMemcpyCostSjoerd Meijer2019-03-201-0/+13
| | | | | | | | | | This adds new function getMemcpyCost to TTI so that the cost of a memcpy can be modeled and queried. The default implementation returns Expensive, but targets can override this function to model the cost more accurately. Differential Revision: https://reviews.llvm.org/D59252 llvm-svn: 356555
* AMDGPU: Partially fix default device for HSAMatt Arsenault2019-03-171-2/+2
| | | | | | | | | | | | | | | | | | There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347
* [AMDGPU] Prepare for introduction of v3 and v5 MVTsTim Renouf2019-03-178-7/+105
| | | | | | | | | | | | | | | | | | | AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342
* [TTI] Add generic cost model for smul/umul overflow intrinsicsSimon Pilgrim2019-02-251-36/+378
| | | | | | Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO. llvm-svn: 354784
* [TTI] Add generic cost model for fixed point smul/umulSimon Pilgrim2019-02-251-36/+378
| | | | | | | | Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline. Differential Revision: https://reviews.llvm.org/D57925 llvm-svn: 354774
* [CostModel][X86] Add UMUL fixed point cost tests Simon Pilgrim2019-02-051-0/+63
| | | | llvm-svn: 353153
* [CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADDNikita Popov2019-01-281-4/+4
| | | | | | | | | Followup to D56636, this time handling the UADDSAT case by expanding uadd.sat(a, b) to umin(a, ~b) + b. Differential Revision: https://reviews.llvm.org/D56869 llvm-svn: 352409
* [TTI] Add generic SADDSAT/SSUBSAT costsSimon Pilgrim2019-01-271-196/+234
| | | | | | | | | | Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow. This completes PR40316 Differential Revision: https://reviews.llvm.org/D57239 llvm-svn: 352315
* [PowerPC] Update Vector Costs for P9Nemanja Ivanovic2019-01-261-0/+68
| | | | | | | | | | | | | For the power9 CPU, vector operations consume a pair of execution units rather than one execution unit like a scalar operation. Update the target transform cost functions to reflect the higher cost of vector operations when targeting Power9. Patch by RolandF. Differential revision: https://reviews.llvm.org/D55461 llvm-svn: 352261
* [CostModel][X86] Add SMUL fixed point cost tests Simon Pilgrim2019-01-241-0/+75
| | | | llvm-svn: 352046
* [TTI] Add generic SADDO/SSUBO costsSimon Pilgrim2019-01-241-36/+378
| | | | | | Added x86 scalar sadd_with_overflow/ssub_with_overflow costs. llvm-svn: 352045
* [TTI] Add generic UADDSAT/USUBSAT costsSimon Pilgrim2019-01-241-162/+181
| | | | | | | | Add generic costs calculation for UADDSAT/USUBSAT intrinsics, this fallbacks to using generic costs for uadd_with_overflow/usub_with_overflow + a select. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352044
* [TTI] Add generic UADDO/USUBO costsSimon Pilgrim2019-01-241-36/+378
| | | | | | | | Added x86 scalar uadd_with_overflow/usub_with_overflow costs. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352043
* [IR] Match intrinsic parameter by scalar/vectorwidthSimon Pilgrim2019-01-231-0/+414
| | | | | | | | | | | | | | This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth. The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth. I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type. The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour Differential Revision: https://reviews.llvm.org/D57090 llvm-svn: 351957
* [CostModel][X86] Add ICMP Predicate specific costsSimon Pilgrim2019-01-221-1036/+1036
| | | | | | | | First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly. Differential Revision: https://reviews.llvm.org/D57013 llvm-svn: 351810
* [CostModel][X86] Add XOP icmp cost tests (PR40376)Simon Pilgrim2019-01-211-0/+462
| | | | llvm-svn: 351741
* [CostModel][X86] Add explicit vector select costsSimon Pilgrim2019-01-2011-689/+866
| | | | | | | | | | Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select. Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason). The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests. llvm-svn: 351685
* [CostModel][X86] Add explicit fcmp costs for pre-SSE42 targetsSimon Pilgrim2019-01-201-512/+512
| | | | | | Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era llvm-svn: 351684
* [CostModel][X86] Split icmp/fcmp costs tests and test all comparison codesSimon Pilgrim2019-01-203-330/+4529
| | | | llvm-svn: 351682
* [CostModel][X86] Add masked load/store/gather/scatter tests for ↵Simon Pilgrim2019-01-202-458/+660
| | | | | | SSE2/SSE42/AVX1 targets llvm-svn: 351681
* [CostModel][X86] Add non-constant vselect cost testsSimon Pilgrim2019-01-201-1/+121
| | | | | | Also add AVX512 costs at the same time llvm-svn: 351680
* Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"Nikita Popov2019-01-151-51/+70
| | | | | | | | | | | | | Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Reapplying with updated SLPVectorizer tests. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351219
* Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"Nikita Popov2019-01-141-70/+51
| | | | | | | | | This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129
* [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectorsNikita Popov2019-01-141-51/+70
| | | | | | | | | | | Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125
OpenPOWER on IntegriCloud