summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add costs for non-AVX512 single-source permutation integer shufflesMichael Kuperstein2017-02-021-3/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D29416 llvm-svn: 293932
* [TargetTransformInfo] Refactor and improve getScalarizationOverhead()Jonas Paulsson2017-01-261-14/+0
| | | | | | | | | | | | | | | | | Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155
* [X86] enable memory interleaving for X86\SLM arch. Mohammed Agabaria2017-01-251-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040
* Remove trailing whitespace. NFCI.Simon Pilgrim2017-01-201-1/+1
| | | | llvm-svn: 292613
* [CostModel][X86] Removed unused cost. NFCI.Simon Pilgrim2017-01-201-1/+0
| | | | | | SHL v8i32 is already handled in the SSE41 cost table llvm-svn: 292612
* [CostModel][X86] Fix AVX512BW vector shift costs for vXi16 typesSimon Pilgrim2017-01-151-0/+8
| | | | | | We already have patterns in place to support 128/256-bit shifts without AVX512VL llvm-svn: 292077
* [CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 ↵Simon Pilgrim2017-01-141-0/+8
| | | | | | has landed llvm-svn: 292023
* [X86][AVX512BW] Vectorize v64i8 vector shiftsSimon Pilgrim2017-01-111-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-111-3/+50
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* [CostModel][X86] Fixed vXi8 uniform shift costs.Simon Pilgrim2017-01-081-6/+16
| | | | | | | | | | The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation). Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled. Added missing AVX2/AVX512BW costs as well. llvm-svn: 291391
* [CostModel][X86] Moved legal uniform shift costs earlier.Simon Pilgrim2017-01-081-24/+39
| | | | | | XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts. llvm-svn: 291390
* [CostModel][X86] Update SSE41/AVX1 vXi32 SHL costsSimon Pilgrim2017-01-071-0/+2
| | | | | | SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq. llvm-svn: 291372
* [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.Simon Pilgrim2017-01-071-2/+15
| | | | llvm-svn: 291366
* [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and aboveSimon Pilgrim2017-01-071-45/+44
| | | | | | We were matching against general vector shift costs before the uniform splat costs llvm-svn: 291365
* [CostModel][X86] Generalized cost calculation of SHL by constant -> MUL ↵Simon Pilgrim2017-01-071-21/+10
| | | | | | conversion. llvm-svn: 291364
* [CostModel][X86] Merge separate AVX1 cost LUTs. NFCI.Simon Pilgrim2017-01-071-38/+30
| | | | llvm-svn: 291355
* [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.Simon Pilgrim2017-01-071-0/+4
| | | | llvm-svn: 291354
* [CostModel][X86] Added missing AVX2 arithmetic costs.Simon Pilgrim2017-01-071-23/+33
| | | | | | Allows us to correctly fall through to the lower AVX1 costs if look up failed. llvm-svn: 291353
* [CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target ↵Simon Pilgrim2017-01-071-27/+27
| | | | | | order. NFCI. llvm-svn: 291352
* [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI ↵Simon Pilgrim2017-01-071-2/+1
| | | | | | v64i8 shuffles (PR31470) llvm-svn: 291347
* [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.Simon Pilgrim2017-01-061-16/+18
| | | | | | Set the costs on the lowest target that supports the type. llvm-svn: 291229
* [CostModel][X86] Tidyup arithmetic costs code. NFCI.Simon Pilgrim2017-01-051-28/+15
| | | | | | Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention. llvm-svn: 291187
* [CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI.Simon Pilgrim2017-01-051-6/+5
| | | | llvm-svn: 291165
* Remove trailing whitespace. NFCI.Simon Pilgrim2017-01-051-3/+3
| | | | llvm-svn: 291163
* [CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. ↵Simon Pilgrim2017-01-051-13/+11
| | | | | | NFCI. llvm-svn: 291162
* [CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI.Simon Pilgrim2017-01-051-11/+3
| | | | | | Removes need for yet another LUT. llvm-svn: 291158
* [CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.Simon Pilgrim2017-01-051-8/+0
| | | | | | Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead. llvm-svn: 291152
* [CostModel][X86] Include the cost of 256-bit upper subvector ↵Simon Pilgrim2017-01-051-2/+2
| | | | | | | | extract/insertion in AVX1 v4i64 MUL Matches other MUL/ADD/SUB 256-bit case on AVX1 llvm-svn: 291149
* [CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common ↵Simon Pilgrim2017-01-051-272/+227
| | | | | | shuffle cost LUTs. NFCI. llvm-svn: 291146
* [CostModel][X86] Add support for broadcast shuffle costsSimon Pilgrim2017-01-051-9/+48
| | | | | | | | Currently only for broadcasts with input and output of the same width. Differential Revision: https://reviews.llvm.org/D27811 llvm-svn: 291122
* [CostModel][X86] Pulled out common type legalization codeSimon Pilgrim2017-01-051-7/+4
| | | | llvm-svn: 291109
* Currently isLikelyComplexAddressComputation tries to figure out if the given ↵Mohammed Agabaria2017-01-051-4/+16
| | | | | | | | | | | | | stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106
* [Test Commit] fixing some format issue in X86TTI to match clang-format output.Mohammed Agabaria2017-01-051-3/+6
| | | | llvm-svn: 291095
* [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costsSimon Pilgrim2017-01-041-11/+9
| | | | | | Actual codegen is much better than the extract+insert patterns that was assumed. llvm-svn: 290962
* [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.Simon Pilgrim2017-01-041-141/+81
| | | | | | As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956
* Fixed shuffle-reverse cost on AVX-512.Elena Demikhovsky2017-01-021-0/+1
| | | | | | (This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812
* AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.Elena Demikhovsky2017-01-021-9/+243
| | | | | | | | | | | | X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810
* [X86][SSE] Improve lowering of vXi64 multiplies Simon Pilgrim2016-12-211-8/+8
| | | | | | | | | | | | | | | | | | | | | | As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 llvm-svn: 290267
* [CostModel][X86] Updated reverse shuffle costsSimon Pilgrim2016-12-151-5/+95
| | | | llvm-svn: 289819
* [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on ↵Simon Pilgrim2016-11-241-0/+4
| | | | | | | | AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882
* [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on ↵Simon Pilgrim2016-11-231-0/+4
| | | | | | | | AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762
* [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costsSimon Pilgrim2016-11-231-6/+12
| | | | llvm-svn: 287760
* [CostModel][X86] Added mul costs for vXi8 vectorsSimon Pilgrim2016-11-141-5/+21
| | | | | | More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838
* [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargetsSimon Pilgrim2016-11-141-0/+4
| | | | | | | | Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832
* [VectorLegalizer] Expansion of CTLZ using CTPOP when possibleSimon Pilgrim2016-11-081-1/+4
| | | | | | | | | | This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233
* Improved cost model for FDIV and FSQRT, by Andrew TischenkoAlexey Bataev2016-10-311-2/+74
| | | | | | | | | | There is a bug describing poor cost model for floating point operations: Bug 29083 - [X86][SSE] Improve costs for floating point operations. This patch is the second one in series of patches dealing with cost model. Differential Revision: https://reviews.llvm.org/D25722 llvm-svn: 285564
* [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targetsSimon Pilgrim2016-10-271-0/+1
| | | | llvm-svn: 285329
* [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64Simon Pilgrim2016-10-271-0/+13
| | | | | | | | | | With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304
* [X86][SSE] Add SSE41/AVX1 costs for vector shifts.Simon Pilgrim2016-10-231-0/+26
| | | | | | We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939
* [X86] Enable interleaved memory access by defaultMichael Kuperstein2016-10-201-0/+7
| | | | | | | | This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779
OpenPOWER on IntegriCloud