summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Improve the type checking in isLegalMaskedLoad and isLegalMaskedGather.Craig Topper2019-03-081-11/+31
| | | | | | | | | | | | We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store. For FP we now explicitly check for only double and float. For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly. We only check bitwidth after checking that the type is an integer. llvm-svn: 355667
* Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit ↵Eric Christopher2019-02-201-7/+0
| | | | | | | | | | | | | | | | horizontal X86 instructions (add, sub)" As this has broken the lto bootstrap build for 3 days and is showing a significant regression on the Dither_benchmark results (from the LLVM benchmark suite) -- specifically, on the BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and BENCHMARK_FLOYD_DITHER_512; the others are unchanged. These have regressed by about 28% on Skylake, 34% on Haswell, and over 40% on Sandybridge. This reverts commit r353923. llvm-svn: 354434
* [X86] Don't consider functions ABI compatible for ArgumentPromotion pass if ↵Craig Topper2019-02-191-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | they view 512-bit vectors differently. The use of the -mprefer-vector-width=256 command line option mixed with functions using vector intrinsics can create situations where one function thinks 512 vectors are legal, but another fucntion does not. If a 512 bit vector is passed between them via a pointer, its possible ArgumentPromotion might try to pass by value instead. This will result in type legalization for the two functions handling the 512 bit vector differently leading to runtime failures. Had the 512 bit vector been passed by value from clang codegen, both functions would have been tagged with a min-legal-vector-width=512 function attribute. That would make them be legalized the same way. I observed this issue in 32-bit mode where a union containing a 512 bit vector was being passed by a function that used intrinsics to one that did not. The caller ended up passing in zmm0 and the callee tried to read it from ymm0 and ymm1. The fix implemented here is just to consider it a mismatch if two functions would handle 512 bit differently without looking at the types that are being considered. This is the easist and safest fix, but it can be improved in the future. Differential Revision: https://reviews.llvm.org/D58390 llvm-svn: 354376
* [X86] Filter out tuning feature flags and a few ISA feature flags when ↵Craig Topper2019-02-191-4/+3
| | | | | | | | | | | | | | | | checking for function inline compatibility. Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining. There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present. Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible. I've based the implementation here of the similar code in AMDGPU. Differential Revision: https://reviews.llvm.org/D58371 llvm-svn: 354355
* [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions ↵Anton Afanasyev2019-02-131-0/+7
| | | | | | | | | | | | | (add, sub) Try to use 64-bit SLP vectorization. In addition to horizontal instrs this change triggers optimizations for partial vector operations (for instance, using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by <2 x float>). Fixes llvm.org/PR32433 llvm-svn: 353923
* [CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADDNikita Popov2019-01-281-0/+7
| | | | | | | | | Followup to D56636, this time handling the UADDSAT case by expanding uadd.sat(a, b) to umin(a, ~b) + b. Differential Revision: https://reviews.llvm.org/D56869 llvm-svn: 352409
* [TTI] Add generic SADDO/SSUBO costsSimon Pilgrim2019-01-241-0/+10
| | | | | | Added x86 scalar sadd_with_overflow/ssub_with_overflow costs. llvm-svn: 352045
* [TTI] Add generic UADDO/USUBO costsSimon Pilgrim2019-01-241-3/+14
| | | | | | | | Added x86 scalar uadd_with_overflow/usub_with_overflow costs. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352043
* [CostModel][X86] Add ICMP Predicate specific costsSimon Pilgrim2019-01-221-8/+49
| | | | | | | | First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly. Differential Revision: https://reviews.llvm.org/D57013 llvm-svn: 351810
* [CostModel][X86] Add explicit vector select costsSimon Pilgrim2019-01-201-0/+41
| | | | | | | | | | Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select. Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason). The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests. llvm-svn: 351685
* [CostModel][X86] Add explicit fcmp costs for pre-SSE42 targetsSimon Pilgrim2019-01-201-0/+11
| | | | | | Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era llvm-svn: 351684
* [TTI][X86] Reordered getCmpSelInstrCost cost tables in descending ISA order. ↵Simon Pilgrim2019-01-201-24/+24
| | | | | | | | NFCI. Minor tidyup to make it clearer whats going on before adding additional costs. llvm-svn: 351683
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"Nikita Popov2019-01-151-0/+7
| | | | | | | | | | | | | Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Reapplying with updated SLPVectorizer tests. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351219
* Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"Nikita Popov2019-01-141-7/+0
| | | | | | | | | This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129
* [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectorsNikita Popov2019-01-141-0/+7
| | | | | | | | | | | Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125
* [x86] explicitly set cost of integer add/subSanjay Patel2019-01-061-0/+8
| | | | | | | | | | | | | | | There are no test changes here in the existing cost model regression tests because integer add/sub have a default legal cost of 1 already. This would break, however, if we custom lower those ops because the default cost model assumes that custom-lowered ops are more expensive. This is similar to the change in rL350403. See discussion in D56011 for more details. When we enhance that patch to handle integer ops, we need this cost model change to avoid unintended diffs here from the custom lowering. llvm-svn: 350496
* [CostModel][X86] Fix SSE1 FADD/FSUB costsSimon Pilgrim2019-01-041-0/+12
| | | | | | | | Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403
* [X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)Simon Pilgrim2019-01-031-0/+44
| | | | | | Costs for real SSE2 instructions llvm-svn: 350295
* Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing ↵Clement Courbet2018-12-201-1/+4
| | | | | | | | overlapping loads. Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change. llvm-svn: 349747
* Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing ↵Clement Courbet2018-12-201-4/+1
| | | | | | | | overlapping loads." Forgot to update PowerPC tests for the GEP->bitcast change. llvm-svn: 349733
* [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.Clement Courbet2018-12-201-1/+4
| | | | | | | | | | | | | | Summary: This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp in just two loads on X86. These were previously calling memcmp. Reviewers: spatel, gchatelet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55263 llvm-svn: 349731
* [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)Simon Pilgrim2018-12-051-7/+7
| | | | | | | | | | This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353
* [X86] Make X86TTIImpl::getCastInstrCost properly handle the case where ↵Craig Topper2018-11-281-23/+25
| | | | | | | | | | | | AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786
* [X86] Add some cost model entries for sext/zext for avx512bwCraig Topper2018-11-281-0/+27
| | | | | | | | | | This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785
* [X86][CostModel] Don't lookup intrinsic cost tables if the intrinsic isn't ↵Craig Topper2018-11-191-60/+64
| | | | | | | | | | | | one we care about We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them. This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see. Differential Revision: https://reviews.llvm.org/D54711 llvm-svn: 347248
* [CostModel] Add generic expansion funnel shift cost supportSimon Pilgrim2018-11-141-13/+11
| | | | | | | | Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854
* [CostModel][X86] Fix constant vector XOP rights shiftsSimon Pilgrim2018-11-131-2/+11
| | | | | | | | We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760
* Fix comment for XOP rotates. NFCI.Simon Pilgrim2018-11-131-1/+1
| | | | llvm-svn: 346753
* [CostModel][X86] Add funnel shift rotation special case costsSimon Pilgrim2018-11-121-1/+82
| | | | | | When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688
* [CostModel][X86] Add SHLD/SHRD scalar funnel shift costsSimon Pilgrim2018-11-121-2/+11
| | | | | | The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683
* [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is ↵Simon Pilgrim2018-11-121-5/+13
| | | | | | aligned within the source vector llvm-svn: 346664
* [CostModel][X86] SK_ExtractSubvector costs must only be tested for vector ↵Simon Pilgrim2018-11-101-1/+1
| | | | | | types (PR39615) llvm-svn: 346589
* [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the ↵Simon Pilgrim2018-11-091-181/+187
| | | | | | start of the source vector llvm-svn: 346538
* [LV] Support vectorization of interleave-groups that require an epilog underDorit Nuzman2018-10-311-10/+18
| | | | | | | | | | | | | | | | | | | | | | optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
* [CostModel][X86] Add realistic vXi64 uitofp vXf64 costsSimon Pilgrim2018-10-251-7/+6
| | | | | | Match codegen improvements from D53649/rL345256 llvm-svn: 345263
* [CostModel][X86] Add realistic i64 uitofp f64 scalar costsSimon Pilgrim2018-10-251-0/+5
| | | | llvm-svn: 345261
* [CostModel][X86] Add vXi8 vector division by constants costs.Simon Pilgrim2018-10-241-0/+16
| | | | | | ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV. llvm-svn: 345175
* [CostModel][X86] Enable non-uniform vector division by constants costs.Simon Pilgrim2018-10-241-26/+62
| | | | | | Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases. llvm-svn: 345164
* [TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no ↵Simon Pilgrim2018-10-231-0/+4
| | | | | | difference in lowering. llvm-svn: 345048
* recommit 344472 after fixing build failure on ARM and PPC.Dorit Nuzman2018-10-141-6/+17
| | | | llvm-svn: 344475
* revert 344472 due to failures.Dorit Nuzman2018-10-141-17/+6
| | | | llvm-svn: 344473
* [IAI,LV] Add support for vectorizing predicated strided accesses using maskedDorit Nuzman2018-10-141-6/+17
| | | | | | | | | | | | | | | | | | | | | | | interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472
* X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_FreeMatthias Braun2018-10-111-1/+5
| | | | | | | | | | | | | | | DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315
* [X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F.Craig Topper2018-08-261-0/+1
| | | | | | | | | | | | | | Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51267 llvm-svn: 340708
* Recommit r338204 "[X86] Correct the immediate cost for 'add/sub i64 %x, ↵Craig Topper2018-07-301-1/+7
| | | | | | | | 0x80000000'." This checks in a more direct way without triggering a UBSAN error. llvm-svn: 338273
* Revert "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'."Dean Michael Berris2018-07-301-7/+1
| | | | | | This reverts commit r338204. llvm-svn: 338236
* [X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'.Craig Topper2018-07-281-1/+7
| | | | | | X86 normally requires immediates to be a signed 32-bit value which would exclude i64 0x80000000. But for add/sub we can negate the constant and use the opposite instruction. llvm-svn: 338204
* [X86] Use alignTo and divideCeil to make some code more readable. NFCCraig Topper2018-07-281-3/+3
| | | | llvm-svn: 338203
* [CostModel][X86] Add SREM/UREM general and constant costs (PR38056)Simon Pilgrim2018-07-071-3/+31
| | | | | | | | | | We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM. This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975). Differential Revision: https://reviews.llvm.org/D48980 llvm-svn: 336486
OpenPOWER on IntegriCloud