summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.Simon Pilgrim2017-01-041-141/+81
| | | | | | As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956
* Fixed shuffle-reverse cost on AVX-512.Elena Demikhovsky2017-01-021-0/+1
| | | | | | (This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812
* AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.Elena Demikhovsky2017-01-021-9/+243
| | | | | | | | | | | | X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810
* [X86][SSE] Improve lowering of vXi64 multiplies Simon Pilgrim2016-12-211-8/+8
| | | | | | | | | | | | | | | | | | | | | | As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 llvm-svn: 290267
* [CostModel][X86] Updated reverse shuffle costsSimon Pilgrim2016-12-151-5/+95
| | | | llvm-svn: 289819
* [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on ↵Simon Pilgrim2016-11-241-0/+4
| | | | | | | | AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882
* [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on ↵Simon Pilgrim2016-11-231-0/+4
| | | | | | | | AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762
* [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costsSimon Pilgrim2016-11-231-6/+12
| | | | llvm-svn: 287760
* [CostModel][X86] Added mul costs for vXi8 vectorsSimon Pilgrim2016-11-141-5/+21
| | | | | | More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838
* [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargetsSimon Pilgrim2016-11-141-0/+4
| | | | | | | | Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832
* [VectorLegalizer] Expansion of CTLZ using CTPOP when possibleSimon Pilgrim2016-11-081-1/+4
| | | | | | | | | | This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233
* Improved cost model for FDIV and FSQRT, by Andrew TischenkoAlexey Bataev2016-10-311-2/+74
| | | | | | | | | | There is a bug describing poor cost model for floating point operations: Bug 29083 - [X86][SSE] Improve costs for floating point operations. This patch is the second one in series of patches dealing with cost model. Differential Revision: https://reviews.llvm.org/D25722 llvm-svn: 285564
* [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targetsSimon Pilgrim2016-10-271-0/+1
| | | | llvm-svn: 285329
* [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64Simon Pilgrim2016-10-271-0/+13
| | | | | | | | | | With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304
* [X86][SSE] Add SSE41/AVX1 costs for vector shifts.Simon Pilgrim2016-10-231-0/+26
| | | | | | We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939
* [X86] Enable interleaved memory access by defaultMichael Kuperstein2016-10-201-0/+7
| | | | | | | | This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779
* [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 ↵Simon Pilgrim2016-10-201-17/+48
| | | | | | | | bit integer vectors We weren't checking for uniform const costs before the general cost, resulting in very high estimates. llvm-svn: 284755
* [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit ↵Simon Pilgrim2016-10-201-2/+30
| | | | | | | | | | integer vectors We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults. We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases. llvm-svn: 284744
* [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32Simon Pilgrim2016-10-181-0/+2
| | | | | | | | | | | | As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459
* NFC: The Cost Model specialization, by Andrey TischenkoAlexey Bataev2016-10-121-0/+25
| | | | | | | | | | | | | | The current Cost Model implementation is very inaccurate and has to be updated, improved, re-implemented to be able to take into account the concrete CPU models and the concrete targets where this Cost Model is being used. For example, the Latency Cost Model should be differ from Code Size Cost Model, etc. This patch is the first step to launch the developing and implementation of a new Cost Model generation. Differential Revision: https://reviews.llvm.org/D25186 llvm-svn: 284012
* Replace "fallthrough" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | | This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902
* Revert "[X86] Support the "ms-hotpatch" attribute."Charles Davis2016-08-081-15/+0
| | | | | | | | This reverts commit r278048. Something changed between the last time I built this--it takes awhile on my ridiculously slow and ancient computer--and now that broke this. llvm-svn: 278053
* [X86] Support the "ms-hotpatch" attribute.Charles Davis2016-08-081-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Based on two patches by Michael Mueller. This is a target attribute that causes a function marked with it to be emitted as "hotpatchable". This particular mechanism was originally devised by Microsoft for patching their binaries (which they are constantly updating to stay ahead of crackers, script kiddies, and other ne'er-do-wells on the Internet), but is now commonly abused by Windows programs to hook API functions. This mechanism is target-specific. For x86, a two-byte no-op instruction is emitted at the function's entry point; the entry point must be immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding. This padding is where the patch code is written. The two byte no-op is then overwritten with a short jump into this code. The no-op is usually a `movl %edi, %edi` instruction; this is used as a magic value indicating that this is a hotpatchable function. Reviewers: majnemer, sanjoy, rnk Subscribers: dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D19908 llvm-svn: 278048
* [LV, X86] Be more optimistic about vectorizing shifts.Michael Kuperstein2016-08-041-21/+22
| | | | | | | | | | | | | | | Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782
* [X86][SSE] Add initial costs for vector CTTZ/CTLZSimon Pilgrim2016-08-041-4/+41
| | | | llvm-svn: 277716
* [AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately ↵Igor Breger2016-08-021-3/+3
| | | | | | | | dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 llvm-svn: 277435
* [X86][SSE] Add cost model values for CTPOP of vectorsSimon Pilgrim2016-07-201-4/+27
| | | | | | | | This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104
* Strip trailing whitespaceSimon Pilgrim2016-07-171-6/+6
| | | | llvm-svn: 275726
* [X86] Make some cast costs more preciseMichael Kuperstein2016-07-111-3/+16
| | | | | | | | | Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106
* [x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434)Sanjay Patel2016-07-061-1/+1
| | | | | | | | | | | | | This is "cvtdq2ps" which does not appear to be particularly slow on any CPU according to Agner's tables. Choosing "5" as a cost here as suggested in: https://llvm.org/bugs/show_bug.cgi?id=21356 ...but it seems very conservative given that the instruction is fully pipelined, and I think these costs are supposed to model throughput. Note that related costs are also most likely too high, but this fixes PR21356 and partly fixes PR28434. llvm-svn: 274658
* [X86] Sort cast cost tables. NFC.Michael Kuperstein2016-07-061-124/+123
| | | | | | | Cast cost tables are now sorted, for each cast type, lexicographically on [source base type, source vector width, dest base type, base vector width]. llvm-svn: 274653
* [X86][SSE] Add cost model for BSWAP of vectorsSimon Pilgrim2016-06-201-3/+24
| | | | | | | | The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217
* [CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targetsSimon Pilgrim2016-06-111-0/+30
| | | | | | To account for the fast PSHUFB implementation now available llvm-svn: 272484
* [X86] Add costs for SSE zext/sext to v4i64 to TTIMichael Kuperstein2016-06-101-0/+14
| | | | | | | | | The costs are somewhat hand-wavy, but should be much closer to the truth than what we get from BasicTTI. Differential Revision: http://reviews.llvm.org/D21156 llvm-svn: 272406
* [x86] avoid code explosion from LoopVectorizer for gather loop (PR27826) Sanjay Patel2016-05-251-2/+10
| | | | | | | | | | | | | | By making pointer extraction from a vector more expensive in the cost model, we avoid the vectorization of a loop that is very likely to be memory-bound: https://llvm.org/bugs/show_bug.cgi?id=27826 There are still bugs related to this, so we may need a more general solution to avoid vectorizing obviously memory-bound loops when we don't have HW gather support. Differential Revision: http://reviews.llvm.org/D20601 llvm-svn: 270729
* [CostModel][X86][XOP] Added XOP costmodel for BITREVERSE Simon Pilgrim2016-05-241-1/+44
| | | | | | Now that we have a nice fast VPPERM solution. Added framework for future intrinsic costs as well. llvm-svn: 270537
* [X86][SSE] Improve cost model for i64 vector comparisons on pre-SSE42 targetsSimon Pilgrim2016-05-091-3/+11
| | | | | | | | | | | | As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton. This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42. Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each. Differential Revision: http://reviews.llvm.org/D20057 llvm-svn: 268972
* [X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode.Ashutosh Nema2016-04-221-2/+0
| | | | | | | | | | | | | | Summary: rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS operations during DAG combine. This generates better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table Whereas this change is also applicable for SSE4.1, so the cost of truncate needs to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from SSE4.1, so it will fall back to SSE2. Reviewers: Simon Pilgrim llvm-svn: 267123
* Do not use getGlobalContext()... ever.Mehdi Amini2016-04-141-5/+5
| | | | | | | | This code was creating a new type in the global context, regardless of which context the user is sitting in, what can possibly go wrong? From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266275
* fix typo; NFCSanjay Patel2016-04-051-1/+1
| | | | llvm-svn: 265442
* [x86] fix cost model inaccuracy for vector memory opsSanjay Patel2016-03-091-4/+4
| | | | | | | | | | | The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069
* AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element ↵Igor Breger2016-03-061-1/+2
| | | | | | | | types on SKX Differential Revision: http://reviews.llvm.org/D17913 llvm-svn: 262803
* AVX1 : Enable vector masked_load/store to AVX1.Igor Breger2016-01-251-1/+1
| | | | | | | | Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675
* Implemented cost model for masked gather and scatter operationsElena Demikhovsky2015-12-281-0/+136
| | | | | | | | | The cost is calculated for all X86 targets. When gather/scatter instruction is not supported we calculate the cost of scalar sequence. Differential revision: http://reviews.llvm.org/D15677 llvm-svn: 256519
* [X86][SSE] Transform truncations between vectors of integers into ↵Cong Hou2015-12-211-3/+3
| | | | | | | | | | | | | | | | | | | X86ISD::PACKUS/PACKSS operations during DAG combine. This patch transforms truncation between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in lowering phase because after type legalization, the original truncation will be turned into a BUILD_VECTOR with each element that is extracted from a vector and then truncated, and from them it is difficult to do this optimization. This greatly improves the performance of truncations on some specific types. Cost table is updated accordingly. Differential revision: http://reviews.llvm.org/D14588 llvm-svn: 256194
* [X86] Prevent constant hoisting for a couple compare immediates that the ↵Craig Topper2015-12-201-1/+13
| | | | | | | | | | selection DAG knows how to optimize into a shift. This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code. Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases. llvm-svn: 256126
* [X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1.Cong Hou2015-12-111-2/+79
| | | | | | | | | | | | Previously in the conversion cost table there are no entries for integer-integer conversions on SSE2. This will result in imprecise costs for certain vectorized operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers are counted from the result of running llc on the new test case in this patch. Differential revision: http://reviews.llvm.org/D15132 llvm-svn: 255315
* AVX-512: Updated cost of FP/SINT/UINT conversion operationsElena Demikhovsky2015-12-021-11/+61
| | | | | | | | | I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode. Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets). Differential Revision: http://reviews.llvm.org/D15074 llvm-svn: 254494
* Pointers in Masked Load, Store, Gather, Scatter intrinsicsElena Demikhovsky2015-11-191-8/+4
| | | | | | | | | | The masked intrinsics support all integer and floating point data types. I added the pointer type to this list. Added tests for CodeGen and for Loop Vectorizer. Updated the Language Reference. Differential Revision: http://reviews.llvm.org/D14150 llvm-svn: 253544
* [X86] A small fix in X86/X86TargetTransformInfo.cpp: check a value type is ↵Cong Hou2015-10-281-1/+2
| | | | | | simple before calling getSimpleVT(). llvm-svn: 251538
OpenPOWER on IntegriCloud