summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32Simon Pilgrim2016-10-181-0/+2
| | | | | | | | | | | | As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459
* NFC: The Cost Model specialization, by Andrey TischenkoAlexey Bataev2016-10-121-0/+25
| | | | | | | | | | | | | | The current Cost Model implementation is very inaccurate and has to be updated, improved, re-implemented to be able to take into account the concrete CPU models and the concrete targets where this Cost Model is being used. For example, the Latency Cost Model should be differ from Code Size Cost Model, etc. This patch is the first step to launch the developing and implementation of a new Cost Model generation. Differential Revision: https://reviews.llvm.org/D25186 llvm-svn: 284012
* Replace "fallthrough" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | | This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902
* Revert "[X86] Support the "ms-hotpatch" attribute."Charles Davis2016-08-081-15/+0
| | | | | | | | This reverts commit r278048. Something changed between the last time I built this--it takes awhile on my ridiculously slow and ancient computer--and now that broke this. llvm-svn: 278053
* [X86] Support the "ms-hotpatch" attribute.Charles Davis2016-08-081-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Based on two patches by Michael Mueller. This is a target attribute that causes a function marked with it to be emitted as "hotpatchable". This particular mechanism was originally devised by Microsoft for patching their binaries (which they are constantly updating to stay ahead of crackers, script kiddies, and other ne'er-do-wells on the Internet), but is now commonly abused by Windows programs to hook API functions. This mechanism is target-specific. For x86, a two-byte no-op instruction is emitted at the function's entry point; the entry point must be immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding. This padding is where the patch code is written. The two byte no-op is then overwritten with a short jump into this code. The no-op is usually a `movl %edi, %edi` instruction; this is used as a magic value indicating that this is a hotpatchable function. Reviewers: majnemer, sanjoy, rnk Subscribers: dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D19908 llvm-svn: 278048
* [LV, X86] Be more optimistic about vectorizing shifts.Michael Kuperstein2016-08-041-21/+22
| | | | | | | | | | | | | | | Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782
* [X86][SSE] Add initial costs for vector CTTZ/CTLZSimon Pilgrim2016-08-041-4/+41
| | | | llvm-svn: 277716
* [AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately ↵Igor Breger2016-08-021-3/+3
| | | | | | | | dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 llvm-svn: 277435
* [X86][SSE] Add cost model values for CTPOP of vectorsSimon Pilgrim2016-07-201-4/+27
| | | | | | | | This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104
* Strip trailing whitespaceSimon Pilgrim2016-07-171-6/+6
| | | | llvm-svn: 275726
* [X86] Make some cast costs more preciseMichael Kuperstein2016-07-111-3/+16
| | | | | | | | | Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106
* [x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434)Sanjay Patel2016-07-061-1/+1
| | | | | | | | | | | | | This is "cvtdq2ps" which does not appear to be particularly slow on any CPU according to Agner's tables. Choosing "5" as a cost here as suggested in: https://llvm.org/bugs/show_bug.cgi?id=21356 ...but it seems very conservative given that the instruction is fully pipelined, and I think these costs are supposed to model throughput. Note that related costs are also most likely too high, but this fixes PR21356 and partly fixes PR28434. llvm-svn: 274658
* [X86] Sort cast cost tables. NFC.Michael Kuperstein2016-07-061-124/+123
| | | | | | | Cast cost tables are now sorted, for each cast type, lexicographically on [source base type, source vector width, dest base type, base vector width]. llvm-svn: 274653
* [X86][SSE] Add cost model for BSWAP of vectorsSimon Pilgrim2016-06-201-3/+24
| | | | | | | | The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217
* [CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targetsSimon Pilgrim2016-06-111-0/+30
| | | | | | To account for the fast PSHUFB implementation now available llvm-svn: 272484
* [X86] Add costs for SSE zext/sext to v4i64 to TTIMichael Kuperstein2016-06-101-0/+14
| | | | | | | | | The costs are somewhat hand-wavy, but should be much closer to the truth than what we get from BasicTTI. Differential Revision: http://reviews.llvm.org/D21156 llvm-svn: 272406
* [x86] avoid code explosion from LoopVectorizer for gather loop (PR27826) Sanjay Patel2016-05-251-2/+10
| | | | | | | | | | | | | | By making pointer extraction from a vector more expensive in the cost model, we avoid the vectorization of a loop that is very likely to be memory-bound: https://llvm.org/bugs/show_bug.cgi?id=27826 There are still bugs related to this, so we may need a more general solution to avoid vectorizing obviously memory-bound loops when we don't have HW gather support. Differential Revision: http://reviews.llvm.org/D20601 llvm-svn: 270729
* [CostModel][X86][XOP] Added XOP costmodel for BITREVERSE Simon Pilgrim2016-05-241-1/+44
| | | | | | Now that we have a nice fast VPPERM solution. Added framework for future intrinsic costs as well. llvm-svn: 270537
* [X86][SSE] Improve cost model for i64 vector comparisons on pre-SSE42 targetsSimon Pilgrim2016-05-091-3/+11
| | | | | | | | | | | | As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton. This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42. Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each. Differential Revision: http://reviews.llvm.org/D20057 llvm-svn: 268972
* [X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode.Ashutosh Nema2016-04-221-2/+0
| | | | | | | | | | | | | | Summary: rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS operations during DAG combine. This generates better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table Whereas this change is also applicable for SSE4.1, so the cost of truncate needs to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from SSE4.1, so it will fall back to SSE2. Reviewers: Simon Pilgrim llvm-svn: 267123
* Do not use getGlobalContext()... ever.Mehdi Amini2016-04-141-5/+5
| | | | | | | | This code was creating a new type in the global context, regardless of which context the user is sitting in, what can possibly go wrong? From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266275
* fix typo; NFCSanjay Patel2016-04-051-1/+1
| | | | llvm-svn: 265442
* [x86] fix cost model inaccuracy for vector memory opsSanjay Patel2016-03-091-4/+4
| | | | | | | | | | | The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069
* AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element ↵Igor Breger2016-03-061-1/+2
| | | | | | | | types on SKX Differential Revision: http://reviews.llvm.org/D17913 llvm-svn: 262803
* AVX1 : Enable vector masked_load/store to AVX1.Igor Breger2016-01-251-1/+1
| | | | | | | | Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675
* Implemented cost model for masked gather and scatter operationsElena Demikhovsky2015-12-281-0/+136
| | | | | | | | | The cost is calculated for all X86 targets. When gather/scatter instruction is not supported we calculate the cost of scalar sequence. Differential revision: http://reviews.llvm.org/D15677 llvm-svn: 256519
* [X86][SSE] Transform truncations between vectors of integers into ↵Cong Hou2015-12-211-3/+3
| | | | | | | | | | | | | | | | | | | X86ISD::PACKUS/PACKSS operations during DAG combine. This patch transforms truncation between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in lowering phase because after type legalization, the original truncation will be turned into a BUILD_VECTOR with each element that is extracted from a vector and then truncated, and from them it is difficult to do this optimization. This greatly improves the performance of truncations on some specific types. Cost table is updated accordingly. Differential revision: http://reviews.llvm.org/D14588 llvm-svn: 256194
* [X86] Prevent constant hoisting for a couple compare immediates that the ↵Craig Topper2015-12-201-1/+13
| | | | | | | | | | selection DAG knows how to optimize into a shift. This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code. Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases. llvm-svn: 256126
* [X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1.Cong Hou2015-12-111-2/+79
| | | | | | | | | | | | Previously in the conversion cost table there are no entries for integer-integer conversions on SSE2. This will result in imprecise costs for certain vectorized operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers are counted from the result of running llc on the new test case in this patch. Differential revision: http://reviews.llvm.org/D15132 llvm-svn: 255315
* AVX-512: Updated cost of FP/SINT/UINT conversion operationsElena Demikhovsky2015-12-021-11/+61
| | | | | | | | | I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode. Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets). Differential Revision: http://reviews.llvm.org/D15074 llvm-svn: 254494
* Pointers in Masked Load, Store, Gather, Scatter intrinsicsElena Demikhovsky2015-11-191-8/+4
| | | | | | | | | | The masked intrinsics support all integer and floating point data types. I added the pointer type to this list. Added tests for CodeGen and for Loop Vectorizer. Updated the Language Reference. Differential Revision: http://reviews.llvm.org/D14150 llvm-svn: 253544
* [X86] A small fix in X86/X86TargetTransformInfo.cpp: check a value type is ↵Cong Hou2015-10-281-1/+2
| | | | | | simple before calling getSimpleVT(). llvm-svn: 251538
* Remove templates from CostTableLookup functions. All instantiations had the ↵Craig Topper2015-10-281-30/+25
| | | | | | | | same type. This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now. llvm-svn: 251490
* Convert cost table lookup functions to return a pointer to the entry or ↵Craig Topper2015-10-271-101/+74
| | | | | | | | | | nullptr instead of the index. This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer. While there make use of ArrayRef and std::find_if. llvm-svn: 251382
* Scalarizer for masked.gather and masked.scatter intrinsics.Elena Demikhovsky2015-10-251-0/+27
| | | | | | | | | | When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations. If the mask is not constant, the scalarizer will build a chain of conditional basic blocks. I added isLegalMaskedGather() isLegalMaskedScatter() APIs. Differential Revision: http://reviews.llvm.org/D13722 llvm-svn: 251237
* Remove two unnecessary conversions from MVT to EVT. NFCCraig Topper2015-10-251-2/+2
| | | | llvm-svn: 251219
* Partially reverted changes from r250686Elena Demikhovsky2015-10-221-2/+4
| | | | | | | | Clang runtime failure was reported. Assertion failed: (isExtended() && "Type is not extended!"), function getTypeForEVT I'll need to add a proper handling for PointerType in masked load/store intrinsics. llvm-svn: 250995
* Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore().Elena Demikhovsky2015-10-191-12/+9
| | | | | | | | | | Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case. Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces. Differential Revision: http://reviews.llvm.org/D13850 llvm-svn: 250686
* [CostModel] Fixed AVX integer shift costsSimon Pilgrim2015-10-171-12/+36
| | | | | | Targets with AVX but without AVX2 were incorrectly reporting costs of 256-bit integer shifts. llvm-svn: 250611
* Fix Clang-tidy modernize-use-nullptr warnings in source directories and ↵Hans Wennborg2015-10-061-4/+5
| | | | | | | | | | generated files; other minor cleanups. Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D13321 llvm-svn: 249482
* [X86] Teach constant hoisting that ANDs with 64-bit immediates in the range ↵Craig Topper2015-10-061-1/+7
| | | | | | | | 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted. Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister. llvm-svn: 249370
* [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP ↵Simon Pilgrim2015-09-301-15/+61
| | | | | | | | | | | | shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878
* [TTI] Make the cost APIs in TargetTransformInfo consistently use 'int'Chandler Carruth2015-08-051-64/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | rather than 'unsigned' for their costs. For something like costs in particular there is a natural "negative" value, that of savings or saved cost. As a consequence, there is a lot of code that subtracts or creates negative values based on cost, all of which is prone to awkwardness or bugs when dealing with an unsigned type. Similarly, we *never* want these values to wrap, as that would cause Very Bad code generation (likely percieved as an infinite loop as we try to emit over 2^32 instructions or some such insanity). All around 'int' seems a much better fit for these basic metrics. I've added asserts to ensure that at least the TTI interface never returns negative numbers here. If we ever have a use case for negative numbers, we can remove this, but this way a bug where someone used '-1' to produce a 'very large' cost will be caught by the assert. This passes all tests, and is also UBSan clean. No functional change intended. Differential Revision: http://reviews.llvm.org/D11741 llvm-svn: 244080
* Rename hasCompatibleFunctionAttributes->areInlineCompatible basedEric Christopher2015-07-291-2/+2
| | | | | | | on suggestions. Currently the function is only used for inline purposes and this is more descriptive for the use. llvm-svn: 243578
* [X86][SSE] Vectorize i64 ASHR operationsSimon Pilgrim2015-07-291-2/+3
| | | | | | | | This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed. Differential Revision: http://reviews.llvm.org/D11439 llvm-svn: 243569
* [X86][SSE] Reordered cast vectorization costs. NFCI.Simon Pilgrim2015-07-191-47/+48
| | | | | | Reordered the data tables at the top and placed the lookups after. The first stage in the yak shaving necessary to get more accurate costs for a variety of targets given the recent improvements to SINT_TO_FP/UINT_TO_FP/SIGN_EXTEND vector lowering. llvm-svn: 242643
* [X86][SSE] Updated SHL/LSHR i64 vectorization costs.Simon Pilgrim2015-07-181-3/+3
| | | | | | This was missed in D8416. llvm-svn: 242621
* Prune trailing whitespaces and CRs.NAKAMURA Takumi2015-07-141-23/+23
| | | | llvm-svn: 242117
* [X86][SSE] Vectorized v4i32 non-uniform shifts.Simon Pilgrim2015-07-121-23/+23
| | | | | | | | | | While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized. This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together. Differential Revision: http://reviews.llvm.org/D11063 llvm-svn: 241989
* Make TargetLowering::getPointerTy() taking DataLayout as an argumentMehdi Amini2015-07-091-13/+13
| | | | | | | | | | | | | | | | Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775
OpenPOWER on IntegriCloud