summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [CostModel] Add generic expansion funnel shift cost supportSimon Pilgrim2018-11-141-13/+11
| | | | | | | | Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854
* [CostModel][X86] Fix constant vector XOP rights shiftsSimon Pilgrim2018-11-131-2/+11
| | | | | | | | We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760
* Fix comment for XOP rotates. NFCI.Simon Pilgrim2018-11-131-1/+1
| | | | llvm-svn: 346753
* [CostModel][X86] Add funnel shift rotation special case costsSimon Pilgrim2018-11-121-1/+82
| | | | | | When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688
* [CostModel][X86] Add SHLD/SHRD scalar funnel shift costsSimon Pilgrim2018-11-121-2/+11
| | | | | | The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683
* [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is ↵Simon Pilgrim2018-11-121-5/+13
| | | | | | aligned within the source vector llvm-svn: 346664
* [CostModel][X86] SK_ExtractSubvector costs must only be tested for vector ↵Simon Pilgrim2018-11-101-1/+1
| | | | | | types (PR39615) llvm-svn: 346589
* [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the ↵Simon Pilgrim2018-11-091-181/+187
| | | | | | start of the source vector llvm-svn: 346538
* [LV] Support vectorization of interleave-groups that require an epilog underDorit Nuzman2018-10-311-10/+18
| | | | | | | | | | | | | | | | | | | | | | optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
* [CostModel][X86] Add realistic vXi64 uitofp vXf64 costsSimon Pilgrim2018-10-251-7/+6
| | | | | | Match codegen improvements from D53649/rL345256 llvm-svn: 345263
* [CostModel][X86] Add realistic i64 uitofp f64 scalar costsSimon Pilgrim2018-10-251-0/+5
| | | | llvm-svn: 345261
* [CostModel][X86] Add vXi8 vector division by constants costs.Simon Pilgrim2018-10-241-0/+16
| | | | | | ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV. llvm-svn: 345175
* [CostModel][X86] Enable non-uniform vector division by constants costs.Simon Pilgrim2018-10-241-26/+62
| | | | | | Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases. llvm-svn: 345164
* [TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no ↵Simon Pilgrim2018-10-231-0/+4
| | | | | | difference in lowering. llvm-svn: 345048
* recommit 344472 after fixing build failure on ARM and PPC.Dorit Nuzman2018-10-141-6/+17
| | | | llvm-svn: 344475
* revert 344472 due to failures.Dorit Nuzman2018-10-141-17/+6
| | | | llvm-svn: 344473
* [IAI,LV] Add support for vectorizing predicated strided accesses using maskedDorit Nuzman2018-10-141-6/+17
| | | | | | | | | | | | | | | | | | | | | | | interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472
* X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_FreeMatthias Braun2018-10-111-1/+5
| | | | | | | | | | | | | | | DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315
* [X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F.Craig Topper2018-08-261-0/+1
| | | | | | | | | | | | | | Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51267 llvm-svn: 340708
* Recommit r338204 "[X86] Correct the immediate cost for 'add/sub i64 %x, ↵Craig Topper2018-07-301-1/+7
| | | | | | | | 0x80000000'." This checks in a more direct way without triggering a UBSAN error. llvm-svn: 338273
* Revert "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'."Dean Michael Berris2018-07-301-7/+1
| | | | | | This reverts commit r338204. llvm-svn: 338236
* [X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'.Craig Topper2018-07-281-1/+7
| | | | | | X86 normally requires immediates to be a signed 32-bit value which would exclude i64 0x80000000. But for add/sub we can negate the constant and use the opposite instruction. llvm-svn: 338204
* [X86] Use alignTo and divideCeil to make some code more readable. NFCCraig Topper2018-07-281-3/+3
| | | | llvm-svn: 338203
* [CostModel][X86] Add SREM/UREM general and constant costs (PR38056)Simon Pilgrim2018-07-071-3/+31
| | | | | | | | | | We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM. This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975). Differential Revision: https://reviews.llvm.org/D48980 llvm-svn: 336486
* [CostModel][X86] Add UDIV/UREM by pow2 costsSimon Pilgrim2018-07-051-15/+29
| | | | | | Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc. llvm-svn: 336371
* [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)Simon Pilgrim2018-06-211-4/+4
| | | | | | These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216
* [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select ↵Simon Pilgrim2018-06-121-24/+24
| | | | | | | | | | | | | | | | | | (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513
* [TTI] Add uniform/non-uniform constant Pow2 detection to ↵Simon Pilgrim2018-05-221-3/+4
| | | | | | | | | | | | | | | | TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-1/+1
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [CostModel][X86] Remove hard coded SDIV/UDIV vector costsSimon Pilgrim2018-04-251-37/+13
| | | | | | Algorithmically compute the 'x20' SDIV/UDIV vector costs - this is necessary for PR36550 when DIV costs will be driven from the scheduler models. llvm-svn: 330870
* [CostModel][X86] Recursive call for cost of imul for packed v16i16 constant ↵Simon Pilgrim2018-04-251-1/+3
| | | | | | | | shift left. Don't just assume cost = 1. llvm-svn: 330834
* [CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targetsSimon Pilgrim2018-04-071-0/+9
| | | | llvm-svn: 329498
* [X86] Update cost model for Goldmont. Add fsqrt costs for SilvermontCraig Topper2018-03-251-15/+48
| | | | | | | | | | | | | | Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451
* [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)Simon Pilgrim2018-02-261-0/+30
| | | | | | | | | | Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133
* [X86][SSE] Increase PMULLD costs to better match hardwareSimon Pilgrim2018-02-101-3/+5
| | | | | | Until Skylake, most hardware could only issue a PMULLD op every other cycle llvm-svn: 324823
* [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused ↵Sanjay Patel2018-02-051-0/+4
| | | | | | | | | | | | | | | (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289
* Spelling mistake in comment. NFCI.Simon Pilgrim2018-01-301-1/+1
| | | | llvm-svn: 323752
* [X86] Add support for passing 'prefer-vector-width' function attribute into ↵Craig Topper2018-01-201-4/+5
| | | | | | | | | | | | X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. llvm-svn: 323015
* [COST]Fix PR35865: Fix cost model evaluation for shuffle on X86.Alexey Bataev2018-01-091-1/+2
| | | | | | | | | | | | | | Summary: If the vector type is transformed to non-vector single type, the compile may crash trying to get vector information about non-vector type. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41862 llvm-svn: 322106
* [X86] Simplify the TTI code for getInterleavedMemoryOpCost around for ↵Craig Topper2017-12-061-9/+4
| | | | | | | | | | AVX512BW. NFCI Previously the lambda for AVX512 passed out a flag that indicated whether AVX512BW was required and that was checked against the AVX512BW subtarget flag outside. This patch changes the interface to pass the AVX512BW subtarget bit in and return its value if we detect 16 or 8 bit types. llvm-svn: 319919
* [PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend ↵Sanjay Patel2017-11-271-0/+4
| | | | | | | | | | | on arg rather than result This should fix PR31455: https://bugs.llvm.org/show_bug.cgi?id=31455 Differential Revision: https://reviews.llvm.org/D28314 llvm-svn: 319094
* [X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is ↵Craig Topper2017-11-251-3/+5
| | | | | | | | | | | | | | | | | | | disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions. Summary: This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior. Test command lines have been added for these two cases. Reviewers: magabari, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40282 llvm-svn: 318983
* [X86] Spell penryn correctly in some comments. NFCCraig Topper2017-11-221-3/+3
| | | | llvm-svn: 318855
* [LV][X86] Support of AVX2 Gathers code generation and update the LV with thisMohammed Agabaria2017-11-201-6/+13
| | | | | | | | | | | | | | | This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-2/+2
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* [TTI][X86] update costs of interleaved load\store of i64\doubleMohammed Agabaria2017-11-161-0/+6
| | | | | | | | | | | | This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2. Reviewers: delena, RKSimon, craig.topper, dorit Reviewed By: dorit Differential Revision: https://reviews.llvm.org/D40008 llvm-svn: 318385
* [X86] Update TTI to report that v1iX/v1fX types aren't legal for masked ↵Craig Topper2017-11-161-2/+10
| | | | | | | | gather/scatter/load/store. The type legalizer will try to scalarize these operations if it sees them, but there is no handling for scalarizing them. This leads to a fatal error. With this change they will now be scalarized by the mem intrinsic scalarizing pass before SelectionDAG. llvm-svn: 318380
* [SLP] Fix PR35047: Fix default cost model for cast op in X86.Alexey Bataev2017-11-071-1/+1
| | | | | | | | | | | | | | | Summary: The cost calculation for default case on X86 target does not always follow correct wayt because of missing 4-th argument in `BaseT::getCastInstrCost()` call. Added this missing parameter. Reviewers: hfinkel, mkuper, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39687 llvm-svn: 317576
* [LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-061-1/+4
| | | | | | | | | | Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471
* [REVERT][LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-051-4/+1
| | | | | | | | | reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433
OpenPOWER on IntegriCloud