summaryrefslogtreecommitdiffstats
path: root/llvm/test/Analysis/CostModel
Commit message (Collapse)AuthorAgeFilesLines
...
* [CostModel][X86] Regenerate vector select cost tests with ↵Simon Pilgrim2018-04-071-59/+73
| | | | | | update_analyze_test_checks.py llvm-svn: 329502
* [CostModel][X86] Regenerate vector integer truncation cost tests with ↵Simon Pilgrim2018-04-071-118/+191
| | | | | | update_analyze_test_checks.py llvm-svn: 329500
* [CostModel][X86] Regenerate silvermont (and added goldmont) cost tests with ↵Simon Pilgrim2018-04-071-46/+357
| | | | | | update_analyze_test_checks.py llvm-svn: 329499
* [CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targetsSimon Pilgrim2018-04-071-4/+4
| | | | llvm-svn: 329498
* [CostModel][X86] Regenerate vector comparison cost tests with ↵Simon Pilgrim2018-04-071-137/+286
| | | | | | update_analyze_test_checks.py llvm-svn: 329497
* [CostModel][X86] Regenerate bit count cost tests with ↵Simon Pilgrim2018-04-063-447/+1415
| | | | | | update_analyze_test_checks.py llvm-svn: 329413
* [CostModel][X86] Regenerate vector shuffle cost tests with ↵Simon Pilgrim2018-04-065-856/+1288
| | | | | | update_analyze_test_checks.py llvm-svn: 329410
* [CostModel][X86] Regenerate bswap/bitreverse cost tests with ↵Simon Pilgrim2018-04-062-160/+442
| | | | | | update_analyze_test_checks.py llvm-svn: 329407
* [CostModel][X86] Regenerate integer extension/truncation cost tests with ↵Simon Pilgrim2018-04-061-167/+336
| | | | | | update_analyze_test_checks.py llvm-svn: 329402
* [CostModel][X86] Regenerate integer division/remainder tests with ↵Simon Pilgrim2018-04-063-367/+673
| | | | | | update_analyze_test_checks.py llvm-svn: 329401
* [CostModel][X86] Regenerate vector shift cost tests with ↵Simon Pilgrim2018-04-063-1229/+4331
| | | | | | update_analyze_test_checks.py llvm-svn: 329400
* [CostModel][X86] Regenerate int<->fp cost tests with ↵Simon Pilgrim2018-04-064-763/+798
| | | | | | update_analyze_test_checks.py llvm-svn: 329398
* [UpdateTestChecks] Add update_analyze_test_checks.py for cost model analysis ↵Simon Pilgrim2018-04-062-1097/+1529
| | | | | | | | | | | | | | generation The script allows the auto-generation of checks for cost model tests to speed up their creation and help improve coverage, which will help a lot with PR36550. If the need arises we can add support for other analyze passes as well, but the cost models was the one I needed to get done - at the moment it just warns that any other analysis mode is unsupported. I've regenerated a couple of x86 test files to show the effect. Differential Revision: https://reviews.llvm.org/D45272 llvm-svn: 329390
* [X86][CostModel] Use generic SSE levels instead of particular CPUs for ↵Simon Pilgrim2018-04-041-5/+5
| | | | | | shuffle costs llvm-svn: 329168
* [X86] Update cost model for Goldmont. Add fsqrt costs for SilvermontCraig Topper2018-03-252-8/+229
| | | | | | | | | | | | | | Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451
* [AArch64] Implement getArithmeticReductionCostMatthew Simpson2018-03-161-5/+5
| | | | | | | | | | This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702
* [TTI, AArch64] Allow the cost model analysis to test vector reduce intrinsicsMatthew Simpson2018-03-161-0/+279
| | | | | | | | | | | | | This patch considers the experimental vector reduce intrinsics in the default implementation of getIntrinsicInstrCost. The cost of these intrinsics is computed with getArithmeticReductionCost and getMinMaxReductionCost. This patch also adds a test case for AArch64 that indicates the costs we currently compute for vector reduce intrinsics. These costs are inaccurate and will be updated in a follow-on patch. Differential Revision: https://reviews.llvm.org/D44489 llvm-svn: 327698
* [AArch64] Adjust the cost of integer vector divisionEvandro Menezes2018-03-071-0/+38
| | | | | | | | | | Since there is no instruction for integer vector division, factor in the cost of singling out each element to be used with the scalar division instruction. Differential revision: https://reviews.llvm.org/D43974 llvm-svn: 326955
* [X86] Add silvermont fp arithmetic cost model testsSimon Pilgrim2018-03-051-0/+73
| | | | | | Add silvermont to existing high coverage tests instead of repeating in slm-arith-costs.ll llvm-svn: 326747
* [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)Simon Pilgrim2018-02-262-85/+85
| | | | | | | | | | Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133
* revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)Sanjay Patel2018-02-213-168/+168
| | | | | | | | There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658
* [TTI CostModel] change default cost of FP ops to 1 (PR36280)Sanjay Patel2018-02-193-168/+168
| | | | | | | | | | | | | | | | | | This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515
* [X86][SSE] Increase PMULLD costs to better match hardwareSimon Pilgrim2018-02-102-18/+18
| | | | | | Until Skylake, most hardware could only issue a PMULLD op every other cycle llvm-svn: 324823
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-24/+24
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* [X86] Make v2i1 and v4i1 legal types without VLXCraig Topper2018-01-071-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967
* [X86] Use mattr instead of mcpu in some of the cost model tests.Craig Topper2017-12-184-39/+33
| | | | | | | | Based on the names of the check lines, features seems more appropriate that cpu. Spotted while prototyping my patch to make 512-bit vectors illegal on SKX sometimes. llvm-svn: 320959
* [X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization.Craig Topper2017-11-291-2/+2
| | | | llvm-svn: 319266
* [LV][X86] Support of AVX2 Gathers code generation and update the LV with thisMohammed Agabaria2017-11-201-0/+37
| | | | | | | | | | | | | | | This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641
* [TTI][X86] update costs of interleaved load\store of i64\doubleMohammed Agabaria2017-11-162-0/+80
| | | | | | | | | | | | This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2. Reviewers: delena, RKSimon, craig.topper, dorit Reviewed By: dorit Differential Revision: https://reviews.llvm.org/D40008 llvm-svn: 318385
* [LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-061-0/+141
| | | | | | | | | | Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471
* [REVERT][LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-051-141/+0
| | | | | | | | | reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433
* [LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-051-0/+141
| | | | | | | | This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432
* [AVX512][AVX2]Cost calculation for interleave load/store patterns ↵Michael Zuckerman2017-10-183-10/+10
| | | | | | | | | | | | | | | | | | | {v8i8,v16i8,v32i8,v64i8} This patch adds accurate instructions cost. The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride. Reviewers: 1. delena 2. Farhana 3. zvi 4. dorit 5. Ayal Differential Revision: https://reviews.llvm.org/D38762 Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7 llvm-svn: 316072
* Revert r314923: "Recommit : Use the basic cost if a GEP is not used as ↵Daniel Jasper2017-10-132-47/+1
| | | | | | | | | | | | | addressing mode" Significantly reduces performancei (~30%) of gipfeli (https://github.com/google/gipfeli) I have not yet managed to reproduce this regression with the open-source version of the benchmark on github, but will work with others to get a reproducer to you later today. llvm-svn: 315680
* Convert an APInt to int64_t properly in TTI::getGEPCost().Justin Lebar2017-10-041-0/+3
| | | | | | | | | | | | | | | | | Summary: If the pointer width is 32 bits and the calculated GEP offset is negative, we call APInt::getLimitedValue(), which does a *zero*-extension of the offset. That's wrong -- we should do an sext. Fixes a bug introduced in rL314362 and found by Evgeny Astigeevich. Reviewers: efriedma Subscribers: sanjoy, javed.absar, llvm-commits, eastig Differential Revision: https://reviews.llvm.org/D38557 llvm-svn: 314935
* [TargetTransformInfo] Check if function pointer is valid before calling ↵Guozhi Wei2017-10-041-0/+4
| | | | | | | | | | isLoweredToCall Function isLoweredToCall can only accept non-null function pointer, but a function pointer can be null for indirect function call. So check it before calling isLoweredToCall from getInstructionLatency. Differential Revision: https://reviews.llvm.org/D38204 llvm-svn: 314927
* Recommit : Use the basic cost if a GEP is not used as addressing modeJun Bum Lim2017-10-042-1/+47
| | | | | | | | | | | | | | Recommitting r314517 with the fix for handling ConstantExpr. Original commit message: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. llvm-svn: 314923
* [X86] Add AVX512 check lines to the cost model truncate test.Craig Topper2017-10-031-0/+13
| | | | llvm-svn: 314758
* Revert "Use the basic cost if a GEP is not used as addressing mode"Alex Shlyapnikov2017-09-292-47/+1
| | | | | | | | | | | | | | | | | | | | | | | This reverts commit r314517. This commit crashes sanitizer bots, for example: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167 Stack snippet: ... /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Value const*>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0 ... llvm-svn: 314560
* Use the basic cost if a GEP is not used as addressing modeJun Bum Lim2017-09-292-1/+47
| | | | | | | | | | | | | | | | | | | Summary: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng Reviewed By: hfinkel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38085 llvm-svn: 314517
* Check for overflows when calculating the offset in GetGEPCost.Justin Lebar2017-09-271-0/+11
| | | | | | | | | | | | | | | | | Summary: This avoids C++ UB if the GEP is weird and the calculation overflows int64_t, and it's also observable in the cost model's results. Such GEPs are almost surely not valid pointers, but LLVM nonetheless generates them sometimes. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38337 llvm-svn: 314362
* [TargetTransformInfo] Handle intrinsic call in getInstructionLatency()Guozhi Wei2017-09-221-0/+6
| | | | | | | | Usually an intrinsic is a simple target instruction, it should have a small latency. A real function call has much larger latency. So handle the intrinsic call in function getInstructionLatency(). Differential Revision: https://reviews.llvm.org/D38104 llvm-svn: 314003
* [TargetTransformInfo] Static alloca has 0 costGuozhi Wei2017-09-151-0/+8
| | | | | | | | Static alloca usually doesn't generate any machine instructions, so it has 0 cost. Differential Revision: https://reviews.llvm.org/D37879 llvm-svn: 313410
* [TargetTransformInfo] Detect 0 latency instructionsGuozhi Wei2017-09-141-0/+18
| | | | | | | | For instructions that unlikely generate machine instructions, they should also have 0 latency. Differential Revision: https://reviews.llvm.org/D37833 llvm-svn: 313288
* [TargetTransformInfo] Add a new public interface getInstructionCostGuozhi Wei2017-09-081-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface: enum TargetCostKind { TCK_RecipThroughput, ///< Reciprocal throughput. TCK_Latency, ///< The latency of instruction. TCK_CodeSize ///< Instruction code size. }; int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const; All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model. This patch also provides a simple default implementation of getInstructionLatency. The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways: Add more detail into this function. Add getXXXLatency function and call it from here. Implement target specific getInstructionLatency function. Differential Revision: https://reviews.llvm.org/D37170 llvm-svn: 312832
* X86: Improve AVX512 fptoui loweringZvi Rackover2017-09-071-4/+4
| | | | | | | | | | | | | | | | | Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704
* AMDGPU: Don't assert in TTI with fp32 denorms enabledMatt Arsenault2017-08-311-11/+78
| | | | | | Also refine for f16 and rcp cases. llvm-svn: 312213
* [CostModel][X86][XOP] Improve costs for XOP shufflesSimon Pilgrim2017-08-162-0/+46
| | | | | | VPPERM/VPERMIL2PD/VPERMIL2PS all provide more effective 2-input shuffles than regular AVX instructions llvm-svn: 311005
* [CostModel][X86] Add SSE2 two-src shuffle costsSimon Pilgrim2017-08-102-12/+12
| | | | llvm-svn: 310654
* [CostModel][X86] Add avx1 two-src shuffle costsSimon Pilgrim2017-08-102-26/+26
| | | | llvm-svn: 310650
OpenPOWER on IntegriCloud