summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [IR] Split out target specific intrinsic enums into separate headersReid Kleckner2019-12-111-0/+1
| | | | | | | | | | | | | | | | | | | | This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
* Rename TTI::getIntImmCost for instructions and intrinsicsReid Kleckner2019-12-111-4/+4
| | | | | | | | | | | | | | Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381
* [ARM] Teach the Arm cost model that a Shift can be folded into other ↵David Green2019-12-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
* [AArch64TTI] Compute imm materialization cost for AArch64 intrinsicsFlorian Hahn2019-12-041-0/+6
| | | | | | | | | | | | | | | | Currently, getIntImmCost returns TCC_Free for almost all intrinsics. For most AArch64 specific intrinsics however, it looks like integer constants cannot be folded into most of them (at least the ones I checked). Unless we know that we can fold integer operands with the intrinsic, we handle more cases correctly by returning the cost to materialize the immediate than return TCC_Free. Reviewers: SjoerdMeijer, dmgreen, t.p.northover, ributzka Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D70669
* [Alignment][NFC] getMemoryOpCost uses MaybeAlignGuillaume Chatelet2019-10-251-4/+4
| | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307
* [System Model] [TTI] Update cache and prefetch TTI interfacesDavid Greene2019-10-091-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo. Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Moving the default TargetTransformInfoImplBase implementation to a default MCSubtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default MCSubtargetInfo implementation essentially returns the defaults TargetTransformInfoImplBase used to return. Existing users of TTI defaults will hit the defaults now in MCSubtargetInfo. Targets that define their own custom TTI implementations won't use the BasicTTIImpl implementations that route to the subtarget. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 374205
* [AArch64] Expand bcmp() for small block lengthsEvandro Menezes2019-08-051-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should the target support the it. Unless `TTI::MemCmpExpansionOptions()` is overridden by the target. In a proprietary benchmark we see a performance drop of about 12% on PNG compression before this patch, though it passes all tests. This patch mirrors X86 for AArch64 and initializes `TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when appropriate. No tuning of the parameters was performed, but, at this point, it's enough to recover the performance drop above. This problem also exists on ARM. Once a consensus is reached for AArch64, we can work to fix ARM as well. Authors: - Evandro Menezes (@evandro) <e.menezes@samsung.com> - Brian Rzycki (@brzycki) <b.rzycki@samsung.com> Differential revision: https://reviews.llvm.org/D64805 llvm-svn: 367898
* Revert "[System Model] [TTI] Update cache and prefetch TTI interfaces"David Greene2019-07-101-0/+16
| | | | | | | | This broke some PPC prefetching tests. This reverts commit 9fdfb045ae8bb643ab0d0455dcf9ecaea3b1eb3c. llvm-svn: 365680
* [System Model] [TTI] Update cache and prefetch TTI interfacesDavid Greene2019-07-101-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Adding a default "no information" subtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default subtarget implementation essentially returns "no information" for these interfaces. None of the existing users of TTI will hit that implementation because they define their own custom TTI implementations and won't use the BasicTTIImpl implementations. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 365676
* [NFC] Test commit, delete trailing whitespaceGraham Hunter2019-05-281-1/+1
| | | | llvm-svn: 361813
* [AArch64] Small fix for getIntImmCostAdhemerval Zanella2019-03-181-2/+4
| | | | | | | | | | | It uses the generic AArch64_IMM::expandMOVImm to get the correct number of instruction used in immediate materialization. Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58461 llvm-svn: 356391
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [LV] Support vectorization of interleave-groups that require an epilog underDorit Nuzman2018-10-311-3/+6
| | | | | | | | | | | | | | | | | | | | | | optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
* [TTI] Add generic SK_Broadcast shuffle costsSimon Pilgrim2018-10-251-2/+13
| | | | | | | | | | I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles. This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time. Differential Revision: https://reviews.llvm.org/D53570 llvm-svn: 345253
* recommit 344472 after fixing build failure on ARM and PPC.Dorit Nuzman2018-10-141-3/+4
| | | | llvm-svn: 344475
* revert 344472 due to failures.Dorit Nuzman2018-10-141-4/+3
| | | | llvm-svn: 344473
* [IAI,LV] Add support for vectorizing predicated strided accesses using maskedDorit Nuzman2018-10-141-3/+4
| | | | | | | | | | | | | | | | | | | | | | | interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472
* Remove trailing spaceFangrui Song2018-07-301-1/+1
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* [AArch64] Add custom lowering for v4i8 trunc storeAdhemerval Zanella2018-06-271-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int *src, int width, unsigned char *dst) { for (int i = 0; i < width; i++) *dst++ = *src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735
* [CostModel][AArch64] Add some initial costs for SK_Select and ↵Simon Pilgrim2018-06-221-17/+32
| | | | | | | | | | | | | | SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329
* Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-141-4/+4
| | | | | | | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
* [AArch64] Improve cost of vector division by constantAdhemerval Zanella2018-05-091-0/+22
| | | | | | | | | | | | | With custom lowering for vector MULLH{S,U}, it is now profitable to vectorize a divide by constant loop for the custom types (v16i8, v8i16, and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which uses a multiply by constant plus adjustment to express a divide by constant. Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by Instruction::AShr. llvm-svn: 331873
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-2/+2
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [TTI, AArch64] Add transpose shuffle kindMatthew Simpson2018-04-261-0/+27
| | | | | | | | | | | | | | This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost. Differential Revision: https://reviews.llvm.org/D45982 llvm-svn: 330941
* [AArch64] Implement getArithmeticReductionCostMatthew Simpson2018-03-161-0/+28
| | | | | | | | | | This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702
* [AArch64] Adjust the cost of integer vector divisionEvandro Menezes2018-03-071-22/+38
| | | | | | | | | | Since there is no instruction for integer vector division, factor in the cost of singling out each element to be used with the scalar division instruction. Differential revision: https://reviews.llvm.org/D43974 llvm-svn: 326955
* Fix -Wsign-compare warnings on WindowsReid Kleckner2018-01-051-1/+1
| | | | | | These arise because enums are 'int' by default. llvm-svn: 321887
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-2/+2
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* Sink some IntrinsicInst.h and Intrinsics.h out of llvm/includeReid Kleckner2017-09-071-0/+1
| | | | | | | Many of these uses can get by with forward declarations. Hopefully this speeds up compilation after adding a single intrinsic. llvm-svn: 312759
* [AArch64][Falkor] Attempt to fix Windows buildbotsGeoff Berry2017-06-281-1/+1
| | | | llvm-svn: 306588
* [AArch64][Falkor] Try to avoid exhausting HW prefetcher resources when ↵Geoff Berry2017-06-281-0/+59
| | | | | | | | | | | | unrolling. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34533 llvm-svn: 306584
* [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.Geoff Berry2017-06-281-2/+2
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554
* [AArch64] Inline callee if its target-features are a subset of the callerFlorian Hahn2017-06-271-0/+14
| | | | | | | | | | | | | | | | | | | Summary: Similar to X86, it should be safe to inline callees if their target-features are a subset of the caller. This change matches GCC's inlining behavior with respect to attributes [1]. [1] https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html#AArch64-Function-Attributes Reviewers: kristof.beyls, javed.absar, rengolin, t.p.northover Reviewed By: t.p.northover Subscribers: aemerson, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D34698 llvm-svn: 306478
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* Re-commit r302678, fixing PR33053.Amara Emerson2017-05-161-0/+25
| | | | | | | The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211
* Revert r302678 "[AArch64] Enable use of reduction intrinsics."Hans Wennborg2017-05-151-23/+0
| | | | | | | | | | | | | | | | | | This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115
* [AArch64] Enable use of reduction intrinsics.Amara Emerson2017-05-101-0/+23
| | | | | | | | | | | | | | The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678
* [AArch64] Consider widening instructions in cost calculationsMatthew Simpson2017-05-091-6/+100
| | | | | | | | | | | | | | | The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582
* [SystemZ] TargetTransformInfo cost functions implemented.Jonas Paulsson2017-04-121-4/+6
| | | | | | | | | | | | | | | | getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052
* [ARM/AArch64] Ensure valid vector element types for interleaved accessesMatthew Simpson2017-04-101-4/+4
| | | | | | | | | | | This patch refactors and strengthens the type checks performed for interleaved accesses. The primary functional change is to ensure that the interleaved accesses have valid element types. The added test cases previously failed because the element type is f128. Differential Revision: https://reviews.llvm.org/D31817 llvm-svn: 299864
* [CodeGenPrep] move aarch64-type-promotion to CGPJun Bum Lim2017-04-031-0/+32
| | | | | | | | | | | | | | | | | Summary: Move the aarch64-type-promotion pass within the existing type promotion framework in CGP. This change also support forking sexts when a new sext is required for promotion. Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853. Reviewers: jmolloy, mcrosier, javed.absar, qcolombet Reviewed By: qcolombet Subscribers: llvm-commits, aemerson, rengolin, mcrosier Differential Revision: https://reviews.llvm.org/D28680 llvm-svn: 299379
* TTI: Split IsSimple in MemIntrinsicInfoMatt Arsenault2017-03-241-4/+0
| | | | | | All this did before was assert in EarlyCSE. llvm-svn: 298724
* [ARM/AArch64] Update costs for interleaved accesses with wide typesMatthew Simpson2017-03-021-2/+4
| | | | | | | | | After r296750, we're able to match interleaved accesses having types wider than 128 bits. This patch updates the associated TTI costs. Differential Revision: https://reviews.llvm.org/D29675 llvm-svn: 296751
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-111-1/+1
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* [AArch64] Consider all vector types for FeatureSlowMisaligned128StoreEvandro Menezes2017-01-101-12/+11
| | | | | | | | | | | | The original code considered only v2i64 as slow for this feature. This patch consider all 128-bit long vector types as slow candidates. In internal tests, extending this feature to all 128-bit vector types resulted in an overall improvement of 1% on Exynos M1. Differential revision: https://reviews.llvm.org/D27998 llvm-svn: 291616
* Currently isLikelyComplexAddressComputation tries to figure out if the given ↵Mohammed Agabaria2017-01-051-2/+5
| | | | | | | | | | | | | stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106
* [AArch64] Guard Misaligned 128-bit store penalty by subtarget featureMatthew Simpson2016-12-151-1/+2
| | | | | | | | | This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing such stores in getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D27677 llvm-svn: 289845
* AArch64: Do not test for CPUs, use SubtargetFeaturesMatthias Braun2016-06-021-21/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Testing for specific CPUs has a number of problems, better use subtarget features: - When some tweak is added for a specific CPU it is often desirable for the next version of that CPU as well, yet we often forget to add it. - It is hard to keep track of checks scattered around the target code; Declaring all target specifics together with the CPU in the tablegen file is a clear representation. - Subtarget features can be tweaked from the command line. To discourage people from using CPU checks in the future I removed the isCortexXX(), isCyclone(), ... functions. I added an getProcFamily() function for exceptional circumstances but made it clear in the comment that usage is discouraged. Reformat feature list in AArch64.td to have 1 feature per line in alphabetical order to simplify merging and sorting for out of tree tweaks. No functional change intended. Differential Revision: http://reviews.llvm.org/D20762 llvm-svn: 271555
* Add parentheses to silence buildbot warningMatthew Simpson2016-04-271-2/+2
| | | | llvm-svn: 267734
* [TTI] Add hook for vector extract with extensionMatthew Simpson2016-04-271-0/+55
| | | | | | | | | | | | | | | This change adds a new hook for estimating the cost of vector extracts followed by zero- and sign-extensions. The motivating example for this change is the SMOV and UMOV instructions on AArch64. These instructions move data from vector to general purpose registers while performing the corresponding extension (sign-extend for SMOV and zero-extend for UMOV) at the same time. For these operations, TargetTransformInfo can assume the extensions are free and only report the cost of the vector extract. The SLP vectorizer has been updated to make use of the new hook. Differential Revision: http://reviews.llvm.org/D18523 llvm-svn: 267725
OpenPOWER on IntegriCloud