summaryrefslogtreecommitdiffstats
path: root/llvm/test/Analysis/CostModel
Commit message (Collapse)AuthorAgeFilesLines
...
* [CostModel][X86] Add avx2 two-src shuffle costsSimon Pilgrim2017-08-102-34/+34
| | | | llvm-svn: 310645
* [CostModel][X86] Extend two src shuffle cost testsSimon Pilgrim2017-08-101-17/+195
| | | | | | Cover most 128/256/512/1024-bit cases for vXf64/vXi64, vXf32/vXi32, vXi16 + vXi8 llvm-svn: 310641
* [CostModel][X86] Add avx512vbmi broadcast/reverse/single-src shuffle cost testsSimon Pilgrim2017-08-103-6/+18
| | | | llvm-svn: 310633
* [CostModel][X86] Improve single src shuffle costsSimon Pilgrim2017-08-101-60/+60
| | | | | | Add missing SK_PermuteSingleSrc costs for AVX2 targets and earlier, also added some of the simpler SK_PermuteTwoSrc costs to support splitting of SK_PermuteSingleSrc shuffles llvm-svn: 310632
* [CostModel][X86] Added v2f64/v2i64 single src shuffle model testsSimon Pilgrim2017-08-101-4/+21
| | | | | | Fixed label checks for all prefixes llvm-svn: 310606
* [SystemZ] Add support for IBM z14 processor (2/3)Ulrich Weigand2017-07-171-17/+36
| | | | | | | | | | | | | | This adds support for the new 32-bit vector float instructions of z14. This includes: - Enabling the instructions for the assembler/disassembler. - CodeGen for the instructions, including new LLVM intrinsics. - Scheduler description support for the instructions. - Update to the vector cost function calculations. In general, CodeGen support for the new v4f32 instructions closely matches support for the existing v2f64 instructions. llvm-svn: 308195
* [X86][CM] update add\sub costs of vectors of 64 in X86\SLM archMohammed Agabaria2017-07-021-7/+21
| | | | | | | | | this patch updates the cost of addq\subq (add\subtract of vectors of 64bits) based on the performance numbers of SLM arch. Differential Revision: https://reviews.llvm.org/D33983 llvm-svn: 306974
* [AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2Dorit Nuzman2017-06-252-0/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238
* [CostModel][X86] Add scalar arithmetic cost testsSimon Pilgrim2017-06-201-7/+55
| | | | llvm-svn: 305810
* [CostModel][X86] Declare costs variables based on typeSimon Pilgrim2017-06-201-470/+470
| | | | | | The alphabetical progression isn't that useful llvm-svn: 305808
* Revert r291254: [AArch64] Reduce vector insert/extract cost for FalkorMatthew Simpson2017-05-241-26/+0
| | | | | | | The default vector insert/extract cost is more profitable on Falkor than the reduced cost. llvm-svn: 303771
* Fix line-endings.Simon Pilgrim2017-05-191-1/+1
| | | | llvm-svn: 303448
* [X86][AVX512] Add 512-bit vector ctpop costs + testsSimon Pilgrim2017-05-181-0/+63
| | | | llvm-svn: 303342
* [X86][AVX512] Add 512-bit vector ctlz costs + testsSimon Pilgrim2017-05-171-6/+150
| | | | llvm-svn: 303300
* [X86][AVX512] Add 512-bit vector cttz costs + testsSimon Pilgrim2017-05-171-6/+125
| | | | llvm-svn: 303293
* [X86] Split ctpop/ctlz/cttz cost testsSimon Pilgrim2017-05-174-587/+599
| | | | | | This will make things a lot easier to test all the permutations of avx512 llvm-svn: 303290
* [X86][AVX512] Add 512-bit vector bitreverse costs + testsSimon Pilgrim2017-05-171-0/+69
| | | | llvm-svn: 303283
* [SystemZ] Modelling of costs of divisions with a constant power of 2.Jonas Paulsson2017-05-171-0/+154
| | | | | | | | Such divisions will eventually be implemented with shifts which should be reflected in the cost function. Review: Ulrich Weigand llvm-svn: 303254
* [X86][AVX1] Account for cost of extract/insert of 256-bit shiftsSimon Pilgrim2017-05-143-52/+52
| | | | llvm-svn: 303023
* [X86][AVX2] Fix costs for v4i64 ashr by splatSimon Pilgrim2017-05-141-2/+2
| | | | llvm-svn: 303022
* [X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splatSimon Pilgrim2017-05-143-38/+38
| | | | llvm-svn: 303021
* [X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul ↵Simon Pilgrim2017-05-141-16/+16
| | | | | | sequences llvm-svn: 303017
* [X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + ↵Simon Pilgrim2017-05-142-6/+6
| | | | | | | | mask. Tweak cost model to match what lowering actually does. llvm-svn: 303013
* [X86][SSE] Account for cost of extract/insert of v32i8 vector shiftsSimon Pilgrim2017-05-143-12/+12
| | | | llvm-svn: 303012
* [X86][XOP] Account for cost of extract/insert of 256-bit vector shiftsSimon Pilgrim2017-05-143-98/+98
| | | | llvm-svn: 303010
* AMDGPU: Make some packed shuffles freeMatt Arsenault2017-05-103-41/+119
| | | | | | | VOP3P instructions can encode access to either half of the register. llvm-svn: 302730
* [AArch64] Consider widening instructions in cost calculationsMatthew Simpson2017-05-091-0/+622
| | | | | | | | | | | | | | | The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582
* [X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics.Simon Pilgrim2017-05-072-24/+24
| | | | | | Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. llvm-svn: 302378
* Support arbitrary address space pointers in masked gather/scatter intrinsics.Elad Cohen2017-05-032-21/+21
| | | | | | | | | | | | Fixes PR31789 - When loop-vectorize tries to use these intrinsics for a non-default address space pointer we fail with a "Calling a function with a bad singature!" assertion. This patch solves this by adding the 'vector of pointers' argument as an overloaded type which will determine the address space. Differential revision: https://reviews.llvm.org/D31490 llvm-svn: 302018
* [SystemZ] Fix target specific testsRenato Golin2017-04-121-0/+2
| | | | llvm-svn: 300078
* [SystemZ] Updated test fp-cast.llJonas Paulsson2017-04-121-58/+58
| | | | | | This did not get included in the previous commit for SystemZ cost functions. llvm-svn: 300053
* [SystemZ] TargetTransformInfo cost functions implemented.Jonas Paulsson2017-04-1213-0/+8096
| | | | | | | | | | | | | | | | getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-2112-99/+99
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* [BasicTTIImpl] Bugfix in getIntrinsicInstrCost()Jonas Paulsson2017-03-161-0/+66
| | | | | | | | | Don't call getScalarizationOverhead(RetTy, true, false) if RetTy is void type. Review: Hal Finkel https://reviews.llvm.org/D31024 llvm-svn: 297954
* [X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalarsSimon Pilgrim2017-03-151-28/+24
| | | | | | Prep work for PR31810 llvm-svn: 297876
* [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improvedJonas Paulsson2017-03-141-12/+12
| | | | | | | | | | | | | | | | | | | | | getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705
* [PPC] Give unaligned memory access lower cost on processor that supports itGuozhi Wei2017-02-172-1/+27
| | | | | | | | | | | | Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost. This patch fixes pr31492. Differential Revision: https://reviews.llvm.org/D28630 This is resubmit of r292680, which was reverted by r293092. The internal application failures were actually caused by a source code bug. llvm-svn: 295506
* [X86] Add costs for non-AVX512 single-source permutation integer shufflesMichael Kuperstein2017-02-021-20/+20
| | | | | | Differential Revision: https://reviews.llvm.org/D29416 llvm-svn: 293932
* [X86] Extend single-source shuffle cost test to test more arches. NFC.Michael Kuperstein2017-02-011-22/+129
| | | | llvm-svn: 293793
* Revert "[PPC] Give unaligned memory access lower cost on processor that ↵Daniel Jasper2017-01-252-27/+1
| | | | | | | | | | supports it" This reverts commit r292680. It is causing significantly worse performance and test timeouts in our internal builds. I have already routed reproduction instructions your way. llvm-svn: 293092
* [PPC] Give unaligned memory access lower cost on processor that supports itGuozhi Wei2017-01-202-1/+27
| | | | | | | | | | Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost. This patch fixes pr31492. Differential Revision: https://reviews.llvm.org/D28630 llvm-svn: 292680
* [CostModel][X86] Fix AVX512BW vector shift costs for vXi16 typesSimon Pilgrim2017-01-153-10/+20
| | | | | | We already have patterns in place to support 128/256-bit shifts without AVX512VL llvm-svn: 292077
* [CostModel][X86] Drop separate AVX512VL checks - they match existing AVX512 ↵Simon Pilgrim2017-01-153-51/+9
| | | | | | | | costs Keep the tests though. llvm-svn: 292076
* [CostModel][X86] Update vector shift tests to correctly check by ↵Simon Pilgrim2017-01-153-199/+232
| | | | | | | | non-constant uniform values. Use shuffle( scslar_to_vector, zeroinitializer) pattern instead of shuffle( vec, zeroinitializer) llvm-svn: 292075
* [CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 ↵Simon Pilgrim2017-01-141-8/+8
| | | | | | has landed llvm-svn: 292023
* [X86][AVX512BW] Vectorize v64i8 vector shiftsSimon Pilgrim2017-01-113-18/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665
* Fix line endingsSimon Pilgrim2017-01-114-421/+421
| | | | llvm-svn: 291663
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-111-0/+317
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* [AArch64] Consider all vector types for FeatureSlowMisaligned128StoreEvandro Menezes2017-01-101-8/+50
| | | | | | | | | | | | The original code considered only v2i64 as slow for this feature. This patch consider all 128-bit long vector types as slow candidates. In internal tests, extending this feature to all 128-bit vector types resulted in an overall improvement of 1% on Exynos M1. Differential revision: https://reviews.llvm.org/D27998 llvm-svn: 291616
* [CostModel][X86] Add AVX512VL vector shift cost tests.Simon Pilgrim2017-01-103-0/+57
| | | | llvm-svn: 291585
OpenPOWER on IntegriCloud