summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [ARM][MVE] Don't unroll intrinsic loops.Sam Parker2020-01-091-4/+5
| | | | | | | | We don't unroll vector loops for MVE targets, but we miss the case when loops only contain intrinsic calls. So just move the logic a bit to catch this case. Differential Revision: https://reviews.llvm.org/D72440
* [ARM][MVE] Enable masked gathers from vector of pointersAnna Welker2020-01-081-0/+24
| | | | | | | | Adds a pass to the ARM backend that takes a v4i32 gather and transforms it into a call to MVE's masked gather intrinsics. Differential Revision: https://reviews.llvm.org/D71743
* Rename TTI::getIntImmCost for instructions and intrinsicsReid Kleckner2019-12-111-1/+1
| | | | | | | | | | | | | | Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381
* [ARM] Enable MVE masked loads and storesDavid Green2019-12-091-1/+1
| | | | | | | With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968
* [ARM] Teach the Arm cost model that a Shift can be folded into other ↵David Green2019-12-091-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
* [ARM] Additional tests and minor formatting. NFCDavid Green2019-12-091-43/+43
| | | | | | This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.
* [ARM] MVE interleaving load and stores.David Green2019-11-191-9/+16
| | | | | | | | | | | | | | | | | | | | Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392
* [ARM][MVE] tail-predicationSjoerd Meijer2019-11-151-0/+3
| | | | | | | This is a follow up of d90804d, to also flag fmcp instructions as instructions that we do not support in tail-predicated vector loops. Differential Revision: https://reviews.llvm.org/D70295
* [ARM][MVE] canTailPredicateLoopSjoerd Meijer2019-11-131-8/+99
| | | | | | | | | | | | | | | | | | This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a first version and it operates on single block loops only. With this change, the vectoriser will now determine if tail-folding scalar remainder loops is possible/desired, which is the first step to generate MVE tail-predicated vector loops. This is disabled by default for now. I.e,, this is depends on option -disable-mve-tail-predication, which is off by default. I will follow up on this soon with a patch for the vectoriser to respect loop hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled, we don't want to tail-fold and we shouldn't query this TTI hook, which is done in D70125. Differential Revision: https://reviews.llvm.org/D69845
* [TTI][LV] preferPredicateOverEpilogueSjoerd Meijer2019-11-061-0/+44
| | | | | | | | | | | | | | | | | We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040
* [Alignment][NFC] getMemoryOpCost uses MaybeAlignGuillaume Chatelet2019-10-251-3/+5
| | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307
* [DAGCombine][ARM] Enable extending masked loadsSam Parker2019-10-171-6/+11
| | | | | | | | | | | Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085
* [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]Sam Parker2019-10-141-1/+1
| | | | | | | | | Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763
* [ARM] Masked loads and storesDavid Green2019-09-151-0/+20
| | | | | | | | | | | | | | | | Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932
* [ARM] Add support for MVE vmaxv and vminvSam Tebbs2019-09-131-2/+2
| | | | | | | | This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv. Differential Revision: https://reviews.llvm.org/D66413 llvm-svn: 371827
* [ARM] Add support for MVE vaddvSam Tebbs2019-08-191-0/+25
| | | | | | | | | This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245
* [ARM] MVE sext costsDavid Green2019-08-191-0/+25
| | | | | | | | | This adds some sext costs for MVE, taken from the length of assembly sequences that we currently generate. Differential Revision: https://reviews.llvm.org/D66010 llvm-svn: 369244
* [ARM] MVE sext of a load is freeDavid Green2019-08-161-0/+15
| | | | | | | | | MVE also has some sext of loads, which will be free just as scalar instructions are. Differential Revision: https://reviews.llvm.org/D66008 llvm-svn: 369118
* [ARM] Add MVE beats vector cost modelDavid Green2019-08-131-21/+67
| | | | | | | | | | | | | | | | | | | | | | | | The MVE architecture has the idea of "beats", where a vector instruction can be executed over several ticks of the architecture. This adds a similar system into the Arm backend cost model, multiplying the cost of all vector instructions by a factor. This factor essentially becomes the expected difference between scalar code and vector code, on average. MVE Vector instructions can also overlap so the a true cost of them is often lower. But equally scalar instructions can in some situations be dual issued, or have other optimisations such as unrolling or make use of dsp instructions. The default is chosen as 2. This should not prevent vectorisation is a most cases (as the vector instructions will still be doing at least 4 times the work), but it will help prevent over vectorising in cases where the benefits are less likely. This adds things so far to the obvious places in ARMTargetTransformInfo, and updates a few related costs like not treating float instructions as cost 2 just because they are floats. Differential Revision: https://reviews.llvm.org/D66005 llvm-svn: 368733
* [ARM] sext of a load is freeDavid Green2019-08-121-0/+21
| | | | | | | | | This teaches the cost model that the sext or zext of a load is going to be free. Differential Revision: https://reviews.llvm.org/D66006 llvm-svn: 368593
* [ARM] MVE shuffle broadcast costsDavid Green2019-08-121-0/+17
| | | | | | | | | | | A VDUP will perform a vector broadcast in a single instruction. Update the cost model for MVE accordingly. Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63448 llvm-svn: 368589
* [ARM] Put some of the TTI costmodel behind hasNeon calls.David Green2019-08-121-74/+75
| | | | | | | | | This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon() checks, now that we have MVE, and updates all the tests accordingly. Differential Revision: https://reviews.llvm.org/D63447 llvm-svn: 368587
* [MVE] Don't try to unroll vectorised MVE loopsDavid Green2019-08-111-0/+5
| | | | | | | | | | | | | | | Due to the nature of the beat system in the MVE architecture, along with tail predication and low-overhead loops, unrolling has less benefit compared to normal loops. You can not, for example, hide the latency of a load with other instructions as you can for scalar code. Preventing unrolling also makes the code easier to read and reason about. So if a loop contains vector code, don't enable the runtime unrolling. At least for the time being. Differential Revision: https://reviews.llvm.org/D65803 llvm-svn: 368530
* [ARM][LowOverheadLoops] Enable by defaultSam Parker2019-07-301-1/+1
| | | | | | | | | The code is now in a good enough state to pass the bunch of tests that I have run (after fixing the bugs), so let's enable it by default. Differential Revision: https://reviews.llvm.org/D65277 llvm-svn: 367297
* [ARM] WLS/LE Code GenerationSam Parker2019-07-011-0/+2
| | | | | | | | | | | | | | | | | Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 364733
* [NFC] move some hardware loop checking code to a common place for other using.Chen Zheng2019-06-191-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D63478 llvm-svn: 363758
* [CodeGen] Check for HardwareLoop Latch ExitBlockSam Parker2019-06-171-4/+0
| | | | | | | | | | | | The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge. Differential Revision: https://reviews.llvm.org/D63336 llvm-svn: 363556
* [NFC] Simplify Call querySam Parker2019-06-131-1/+1
| | | | | | Use getIntrinsicID() directly from IntrinsicInst. llvm-svn: 363235
* [ARM][TTI] Scan for existing loop intrinsicsSam Parker2019-06-131-5/+33
| | | | | | | | | TTI should report that it's not profitable to generate a hardware loop if it, or one of its child loops, has already been converted. Differential Revision: https://reviews.llvm.org/D63212 llvm-svn: 363234
* [ARM] Implement TTI::isHardwareLoopProfitableSam Parker2019-06-121-0/+194
| | | | | | | | | | | | | | | | | | | | Implement the backend target hook to drive the HardwareLoops pass. The low-overhead branch extension for Arm M-class cores is flexible enough that we don't have to ensure correctness at this point, except checking that the loop counter variable can be stored in LR - a 32-bit register. For it to be profitable, we want to avoid loops that contain function calls, or any other instruction that alters the PC. This implementation uses TargetLoweringInfo, to query type and operation actions, looks at intrinsic calls and also performs some manual checks for remainder/division and FP operations. I think this should be a good base to start and extra details can be filled out later. Differential Revision: https://reviews.llvm.org/D62907 llvm-svn: 363149
* [ARM] Enable Unroll UpperBoundDavid Green2019-06-101-0/+1
| | | | | | | | | | | This option allows loops with small max trip counts to be fully unrolled. This can help with code like the remainder loops from manually unrolled loops like those that appear in the cmsis dsp library. We would apparently previously runtime unroll them with the default unroll count (4). Differential Revision: https://reviews.llvm.org/D63064 llvm-svn: 362928
* [ARM] Implement TTI::getMemcpyCostSjoerd Meijer2019-04-301-0/+35
| | | | | | | | | This implements TargetTransformInfo method getMemcpyCost, which estimates the number of instructions to which a memcpy instruction expands to. Differential Revision: https://reviews.llvm.org/D59787 llvm-svn: 359547
* [IR] Refactor attribute methods in Function class (NFC)Evandro Menezes2019-04-041-1/+1
| | | | | | | | Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731
* [ARM] Mark 255 and 65535 as cheap for Thumb1 "And"David Green2019-02-041-3/+7
| | | | | | | | | | This prevents Constant Hoisting from pulling the constant out of the block, allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap by the default getIntImmCost, but I've added it for clarity. Differential Revision: https://reviews.llvm.org/D57671 llvm-svn: 353040
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [LV] Support vectorization of interleave-groups that require an epilog underDorit Nuzman2018-10-311-3/+5
| | | | | | | | | | | | | | | | | | | | | | optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
* [TTI] Add generic SK_Broadcast shuffle costsSimon Pilgrim2018-10-251-3/+22
| | | | | | | | | | I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles. This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time. Differential Revision: https://reviews.llvm.org/D53570 llvm-svn: 345253
* [TTI] Add generic cost handling of SK_Reverse shufflesSimon Pilgrim2018-10-231-0/+2
| | | | | | | | These can be treated as a general permute. This required a fix for missing reverse patterns on ARM llvm-svn: 345015
* recommit 344472 after fixing build failure on ARM and PPC.Dorit Nuzman2018-10-141-3/+5
| | | | llvm-svn: 344475
* revert 344472 due to failures.Dorit Nuzman2018-10-141-5/+3
| | | | llvm-svn: 344473
* [IAI,LV] Add support for vectorizing predicated strided accesses using maskedDorit Nuzman2018-10-141-3/+5
| | | | | | | | | | | | | | | | | | | | | | | interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472
* [Thumb1] Any imm8 should have cost of 1Zhaoshi Zheng2018-09-241-2/+2
| | | | | | | | | A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or [0, 255] in unsigned i8 type on Thumb1. Differential Revision: https://reviews.llvm.org/D52257 llvm-svn: 342898
* Remove trailing spaceFangrui Song2018-07-301-1/+1
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* [UnrollAndJam] New Unroll and Jam passDavid Green2018-07-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062
* [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select ↵Simon Pilgrim2018-06-121-6/+6
| | | | | | | | | | | | | | | | | | (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513
* Revert 333358 as it's failing on some builders.David Green2018-05-271-2/+0
| | | | | | I'm guessing the tests reply on the ARM backend being built. llvm-svn: 333359
* [UnrollAndJam] Add a new Unroll and Jam passDavid Green2018-05-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now-jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 333358
* Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-141-4/+4
| | | | | | | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
* [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to ↵Craig Topper2018-03-291-1/+1
| | | | | | | | | | | | CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806
* Fix layering by moving ValueTypes.h from CodeGen to IRDavid Blaikie2018-03-231-1/+1
| | | | | | ValueTypes.h is implemented in IR already. llvm-svn: 328397
OpenPOWER on IntegriCloud