summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Analysis/TargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Tune inlining parameters for AMDGPU targetDaniil Fukalov2019-07-171-0/+4
| | | | | | | | | | | | | | | | | | | Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348
* Revert "[HardwareLoops] NFC - move hardware loop checking code to ↵Jinsong Ji2019-07-091-32/+1
| | | | | | | | isHardwareLoopProfitable()" This reverts commit d95557306585404893d610784edb3e32f1bfce18. llvm-svn: 365520
* [HardwareLoops] NFC - move hardware loop checking code to ↵Chen Zheng2019-07-091-1/+32
| | | | | | | | isHardwareLoopProfitable() Differential Revision: https://reviews.llvm.org/D64197 llvm-svn: 365497
* [PowerPC] exclude ICmpZero in LSR if icmp can be replaced in later hardware ↵Chen Zheng2019-07-031-0/+7
| | | | | | | | | loop. Differential Revision: https://reviews.llvm.org/D63477 llvm-svn: 364993
* [HardwareLoops] NFC - move loop with irreducible control flow checking logic ↵Chen Zheng2019-06-261-7/+11
| | | | | | to HarewareLoopInfo. llvm-svn: 364415
* [HardwareLoops] NFC - move loop with irreducible control flow checking logic ↵Chen Zheng2019-06-261-1/+9
| | | | | | to isHardwareLoopProfitable() llvm-svn: 364397
* [ExpandMemCmp] Move all options to TargetTransformInfo.Clement Courbet2019-06-251-3/+3
| | | | | | Split off from D60318. llvm-svn: 364281
* [NFC] move some hardware loop checking code to a common place for other using.Chen Zheng2019-06-191-0/+85
| | | | | | Differential Revision: https://reviews.llvm.org/D63478 llvm-svn: 363758
* [GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra ↵Amara Emerson2019-06-171-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | block. Inter-block localization is the same as what currently happens, except now it only runs on the entry block because that's where the problematic constants with long live ranges come from. The second phase is a new intra-block localization phase which attempts to re-sink the already localized instructions further right before one of the multiple uses. One additional change is to also localize G_GLOBAL_VALUE as they're constants too. However, on some targets like arm64 it takes multiple instructions to materialize the value, so some additional heuristics with a TTI hook have been introduced attempt to prevent code size regressions when localizing these. Overall, these changes improve CTMark code size on arm64 by 1.2%. Full code size results: Program baseline new diff ------------------------------------------------------------------------------ test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6% test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6% test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4% test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2% test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2% test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9% test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0% test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0% Geomean difference -1.2% Differential Revision: https://reviews.llvm.org/D63303 llvm-svn: 363632
* [LV] Suppress vectorization in some nontemporal casesWarren Ristow2019-06-171-0/+10
| | | | | | | | | | | | | | | | | | | | | When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type *DataType, unsigned Alignment) const; bool isLegalNTLoad(Type *DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581
* [CodeGen] Generic Hardware Loop SupportSam Parker2019-06-071-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774
* [CostModel] Add really basic support for being able to query the cost of the ↵Craig Topper2019-05-281-0/+10
| | | | | | | | | | | | | | | | | | | | | | | FNeg instruction. Summary: This reuses the getArithmeticInstrCost, but passes dummy values of the second operand flags. The X86 costs are wrong and can be improved in a follow up. I just wanted to stop it from reporting an unknown cost first. Reviewers: RKSimon, spatel, andrew.w.kaylor, cameron.mcinally Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62444 llvm-svn: 361788
* [ARM] Implement TTI::getMemcpyCostSjoerd Meijer2019-04-301-0/+6
| | | | | | | | | This implements TargetTransformInfo method getMemcpyCost, which estimates the number of instructions to which a memcpy instruction expands to. Differential Revision: https://reviews.llvm.org/D59787 llvm-svn: 359547
* [ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and ↵Craig Topper2019-03-211-0/+8
| | | | | | | | | | | | | | compressstore intrinsics. This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle. I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway. Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that. Differential Revision: https://reviews.llvm.org/D59180 llvm-svn: 356687
* [TTI] Enable analysis of clib functions in getIntrinsicCosts. NFCI.Sjoerd Meijer2019-03-121-6/+9
| | | | | | | | | | | | | | | | | This is addressing the issue that we're not modeling the cost of clib functions in TTI::getIntrinsicCosts and thus we're basically addressing this fixme: // FIXME: This is wrong for libc intrinsics. To enable analysis of clib functions, we not only need an intrinsic ID and formal arguments, but also the actual user of that function so that we can e.g. look at alignment and values of arguments. So, this is the initial plumbing to pass the user of an intrinsinsic on to getCallCosts, which queries getIntrinsicCosts. Differential Revision: https://reviews.llvm.org/D59014 llvm-svn: 355901
* [LSR] Generate cross iteration indexesSam Parker2019-02-071-0/+4
| | | | | | | | | | | | | | Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate a pre-indexed access. This allows the pointer to be updated via the single pre-indexed access so that (hopefully) no add/subs are required to update it for the next iteration. For small cores, this can significantly improve performance DSP-like loops. Differential Revision: https://reviews.llvm.org/D55373 llvm-svn: 353403
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Only promote args when function attributes are compatibleTom Stellard2019-01-161-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Check to make sure that the caller and the callee have compatible function arguments before promoting arguments. This uses the same TargetTransformInfo queries that are used to determine if attributes are compatible for inlining. The goal here is to avoid breaking ABI when a called function's ABI depends on a target feature that is not enabled in the caller. This is a very conservative fix for PR37358. Ideally we would have a more sophisticated check for ABI compatiblity rather than checking if the attributes are compatible for inlining. Reviewers: echristo, chandlerc, eli.friedman, craig.topper Reviewed By: echristo, chandlerc Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits Differential Revision: https://reviews.llvm.org/D53554 llvm-svn: 351296
* [TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue Simon Pilgrim2018-11-141-0/+7
| | | | llvm-svn: 346868
* [TTI] Make TargetTransformInfo::getOperandInfo static. NFCI.Simon Pilgrim2018-11-131-2/+1
| | | | | | It has no member dependencies and this makes it easier to reuse in other cost analysis code. llvm-svn: 346755
* [TTI] Flip vector types in getShuffleCost SK_ExtractSubvector callSimon Pilgrim2018-11-091-1/+1
| | | | | | | | For SK_ExtractSubvector, the default 'Ty' type is the source operand type and 'SubTy' is the destination subvector type I got this the wrong way around when I added rL346510 llvm-svn: 346534
* [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput ↵Simon Pilgrim2018-11-091-2/+8
| | | | | | | | (PR39368) Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks. llvm-svn: 346510
* [LV] Support vectorization of interleave-groups that require an epilog underDorit Nuzman2018-10-311-3/+6
| | | | | | | | | | | | | | | | | | | | | | optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
* recommit 344472 after fixing build failure on ARM and PPC.Dorit Nuzman2018-10-141-3/+7
| | | | llvm-svn: 344475
* revert 344472 due to failures.Dorit Nuzman2018-10-141-7/+3
| | | | llvm-svn: 344473
* [IAI,LV] Add support for vectorizing predicated strided accesses using maskedDorit Nuzman2018-10-141-3/+7
| | | | | | | | | | | | | | | | | | | | | | | interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472
* [LoopVectorizer] Use TTI.getOperandInfo()Jonas Paulsson2018-10-051-43/+43
| | | | | | | | | | | | | | | | Call getOperandInfo() instead of using (near) duplicated code in LoopVectorizationCostModel::getInstructionCost(). This gets the OperandValueKind and OperandValueProperties values for a Value passed as operand to an arithmetic instruction. getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but is now instead a public member. Review: Florian Hahn https://reviews.llvm.org/D52883 llvm-svn: 343852
* Remove trailing spaceFangrui Song2018-07-301-9/+9
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* [TargetTransformInfo] Add pow2 analysis for scalar constantsSimon Pilgrim2018-07-111-0/+6
| | | | | | Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs. llvm-svn: 336827
* [IR] move shuffle mask queries from TTI to ShuffleVectorInstSanjay Patel2018-06-191-168/+18
| | | | | | | | | | | | | | | | The optimizer is getting smarter (eg, D47986) about differentiating shuffles based on its mask values, so we should make queries on the mask constant operand generally available to avoid code duplication. We'll probably use this soon in the vectorizers and instcombine (D48023 and https://bugs.llvm.org/show_bug.cgi?id=37806). We might clean up TTI a bit more once all of its current 'SK_*' options are covered. Differential Revision: https://reviews.llvm.org/D48236 llvm-svn: 335067
* Fix namespaces. No functionality change.Benjamin Kramer2018-06-161-1/+1
| | | | llvm-svn: 334890
* [CostModel] Cleanup isSingleSourceVectorMask to match other shuffle ↵Simon Pilgrim2018-06-141-10/+12
| | | | | | matchers. NFCI. llvm-svn: 334699
* [CostModel] Recognise REVERSE shuffle mask if the elements come from the ↵Simon Pilgrim2018-06-141-4/+11
| | | | | | second src llvm-svn: 334698
* [CostModel] Recognise BROADCAST shuffle mask if the elements come from the ↵Simon Pilgrim2018-06-131-4/+11
| | | | | | second src llvm-svn: 334620
* [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select ↵Simon Pilgrim2018-06-121-19/+15
| | | | | | | | | | | | | | | | | | (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513
* Fix signed/unsigned warning. NFCI.Simon Pilgrim2018-06-121-2/+2
| | | | llvm-svn: 334509
* [CostModel] Treat Identity shuffle masks as zero costSimon Pilgrim2018-06-121-0/+20
| | | | | | | | | | As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all match - but we could probably accept all cases. Differential Revision: https://reviews.llvm.org/D47986 llvm-svn: 334506
* [TTI] Add uniform/non-uniform constant Pow2 detection to ↵Simon Pilgrim2018-05-221-13/+28
| | | | | | | | | | | | | | | | TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-1/+1
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [TTI, AArch64] Add transpose shuffle kindMatthew Simpson2018-04-261-10/+74
| | | | | | | | | | | | | | This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost. Differential Revision: https://reviews.llvm.org/D45982 llvm-svn: 330941
* [LV] Introduce TTI::getMinimumVFKrzysztof Parzyszek2018-04-131-0/+4
| | | | | | | | | | | The function getMinimumVF(ElemWidth) will return the minimum VF for a vector with elements of size ElemWidth bits. This value will only apply to targets for which TTI::shouldMaximizeVectorBandwidth returns true. The value of 0 indicates that there is no minimum VF. Differential Revision: https://reviews.llvm.org/D45271 llvm-svn: 330062
* Plumb useAA through TargetTransformInfo to remove Transforms->CodeGen header ↵David Blaikie2018-03-281-0/+2
| | | | | | | | dependency Thanks to echristo for the pointers on direction. llvm-svn: 328737
* [LV] Add TTI::shouldMaximizeVectorBandwidth to allow enabling it per targetKrzysztof Parzyszek2018-03-271-0/+4
| | | | | | | | The default implementation returns false and keeps the current behavior. Differential Revision: https://reviews.llvm.org/D44735 llvm-svn: 328632
* [LSR] Allow giving priority to post-incrementing addressing modesKrzysztof Parzyszek2018-03-261-0/+14
| | | | | | | | | | | Implement TTI interface for targets to indicate that the LSR should give priority to post-incrementing addressing modes. Combination of patches by Sebastian Pop and Brendon Cahoon. Differential Revision: https://reviews.llvm.org/D44758 llvm-svn: 328490
* [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused ↵Sanjay Patel2018-02-051-0/+4
| | | | | | | | | | | | | | | (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289
* Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass ↵Zaara Syeda2018-01-301-0/+4
| | | | | | | | | | | | | | | | | | | | | to mark candidates with coldcc attribute. This recommits r322721 reverted due to sanitizer memory leak build bot failures. Original commit message: This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 323778
* Revert [PowerPC] This reverts commit rL322721Zaara Syeda2018-01-171-4/+0
| | | | | | Failing build bots. Revert the commit now. llvm-svn: 322748
* [PowerPC] Add handling for ColdCC calling convention and a pass to markZaara Syeda2018-01-171-0/+4
| | | | | | | | | | | | | | | | candidates with coldcc attribute. This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 322721
* Revert r321377, it causes regression to https://reviews.llvm.org/P8055.Guozhi Wei2017-12-281-4/+0
| | | | llvm-svn: 321528
* [SimplifyCFG] Don't do if-conversion if there is a long dependence chainGuozhi Wei2017-12-221-0/+4
| | | | | | | | | | If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor. This patch checks for the long dependence chain, and give up if-conversion if find one. Differential Revision: https://reviews.llvm.org/D39352 llvm-svn: 321377
OpenPOWER on IntegriCloud