summaryrefslogtreecommitdiffstats
path: root/llvm/test/Analysis/CostModel/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* [CostModel] Model all `extractvalue`s as free.Roman Lebedev2019-08-291-24/+24
| | | | | | | | | | | | | | | | | | | Summary: As disscussed in https://reviews.llvm.org/D65148#1606412, `extractvalue` don't actually generate any code, so we should treat them as free. Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen Reviewed By: jmolloy Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66098 llvm-svn: 370339
* [CostModel][X86][AArch64] Check all 3 cost kinds in aggregates.llRoman Lebedev2019-08-121-25/+91
| | | | llvm-svn: 368595
* [CostModel][X86][AArch64] Add some tests for extractvalueRoman Lebedev2019-08-121-0/+76
| | | | | | | In https://reviews.llvm.org/D65148 it is suggested that it should have zero cost, always. llvm-svn: 368548
* Improve reduction intrinsics by overloading result value.Sander de Smalen2019-06-131-87/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses the mechanism from D62995 to strengthen the definitions of the reduction intrinsics by letting the scalar result/accumulator type be overloaded from the vector element type. For example: ; The LLVM LangRef specifies that the scalar result must equal the ; vector element type, but this is not checked/enforced by LLVM. declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) This patch changes that into: declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a) Which has the type-constraint more explicit and causes LLVM to check the result type with the vector element type. Reviewers: RKSimon, arsenm, rnk, greened, aemerson Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62996 llvm-svn: 363240
* [CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max ↵Craig Topper2018-12-101-22/+22
| | | | | | | | | | | | | | | | reduction. Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register? Reviewers: RKSimon, spatel, ABataev Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55480 llvm-svn: 348739
* [TTI] Reduction costs only need to include a single extract element cost ↵Simon Pilgrim2018-12-011-22/+22
| | | | | | | | | | | | | | | | (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076
* Revert "[TTI] Reduction costs only need to include a single extract element ↵Fedor Sergeev2018-11-261-22/+22
| | | | | | | | | | cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541
* [TTI] Reduction costs only need to include a single extract element costSimon Pilgrim2018-11-151-22/+22
| | | | | | | | | | | | We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 346970
* [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)Simon Pilgrim2018-10-301-22/+22
| | | | | | | | | | | | | | | | Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617
* Add BROADCAST shuffle cost tests.Simon Pilgrim2018-10-231-0/+35
| | | | | | Part of a lot of cleanup necessary before PR39368. llvm-svn: 345025
* [AArch64] Add custom lowering for v4i8 trunc storeAdhemerval Zanella2018-06-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int *src, int width, unsigned char *dst) { for (int i = 0; i < width; i++) *dst++ = *src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735
* [CostModel][AArch64] Add some initial costs for SK_Select and ↵Simon Pilgrim2018-06-221-6/+6
| | | | | | | | | | | | | | SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329
* [CostModel][AArch64] Add cost tests for ALTERNATE/SELECT style shuffle masksSimon Pilgrim2018-06-141-0/+95
| | | | | | Precursor to fixing a regression with SLP vectorizer for supporting SELECT shuffles (vs the current ALTERNATE) llvm-svn: 334714
* [AArch64] Improve cost of vector division by constantAdhemerval Zanella2018-05-091-0/+45
| | | | | | | | | | | | | With custom lowering for vector MULLH{S,U}, it is now profitable to vectorize a divide by constant loop for the custom types (v16i8, v8i16, and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which uses a multiply by constant plus adjustment to express a divide by constant. Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by Instruction::AShr. llvm-svn: 331873
* [TTI, AArch64] Add transpose shuffle kindMatthew Simpson2018-04-261-20/+20
| | | | | | | | | | | | | | This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost. Differential Revision: https://reviews.llvm.org/D45982 llvm-svn: 330941
* [AArch64] Add cost model test case for transposeMatthew Simpson2018-04-231-0/+182
| | | | | | | This patch adds a cost model test case for vector shuffles having transpose masks. The given costs are inaccurate and will be updated in a follow-on patch. llvm-svn: 330625
* [AArch64] Implement getArithmeticReductionCostMatthew Simpson2018-03-161-5/+5
| | | | | | | | | | This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702
* [TTI, AArch64] Allow the cost model analysis to test vector reduce intrinsicsMatthew Simpson2018-03-161-0/+279
| | | | | | | | | | | | | This patch considers the experimental vector reduce intrinsics in the default implementation of getIntrinsicInstrCost. The cost of these intrinsics is computed with getArithmeticReductionCost and getMinMaxReductionCost. This patch also adds a test case for AArch64 that indicates the costs we currently compute for vector reduce intrinsics. These costs are inaccurate and will be updated in a follow-on patch. Differential Revision: https://reviews.llvm.org/D44489 llvm-svn: 327698
* [AArch64] Adjust the cost of integer vector divisionEvandro Menezes2018-03-071-0/+38
| | | | | | | | | | Since there is no instruction for integer vector division, factor in the cost of singling out each element to be used with the scalar division instruction. Differential revision: https://reviews.llvm.org/D43974 llvm-svn: 326955
* Revert r314923: "Recommit : Use the basic cost if a GEP is not used as ↵Daniel Jasper2017-10-131-46/+0
| | | | | | | | | | | | | addressing mode" Significantly reduces performancei (~30%) of gipfeli (https://github.com/google/gipfeli) I have not yet managed to reproduce this regression with the open-source version of the benchmark on github, but will work with others to get a reproducer to you later today. llvm-svn: 315680
* Recommit : Use the basic cost if a GEP is not used as addressing modeJun Bum Lim2017-10-041-0/+46
| | | | | | | | | | | | | | Recommitting r314517 with the fix for handling ConstantExpr. Original commit message: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. llvm-svn: 314923
* Revert "Use the basic cost if a GEP is not used as addressing mode"Alex Shlyapnikov2017-09-291-46/+0
| | | | | | | | | | | | | | | | | | | | | | | This reverts commit r314517. This commit crashes sanitizer bots, for example: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167 Stack snippet: ... /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Value const*>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0 ... llvm-svn: 314560
* Use the basic cost if a GEP is not used as addressing modeJun Bum Lim2017-09-291-0/+46
| | | | | | | | | | | | | | | | | | | Summary: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng Reviewed By: hfinkel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38085 llvm-svn: 314517
* Revert r291254: [AArch64] Reduce vector insert/extract cost for FalkorMatthew Simpson2017-05-241-26/+0
| | | | | | | The default vector insert/extract cost is more profitable on Falkor than the reduced cost. llvm-svn: 303771
* [AArch64] Consider widening instructions in cost calculationsMatthew Simpson2017-05-091-0/+622
| | | | | | | | | | | | | | | The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582
* [AArch64] Consider all vector types for FeatureSlowMisaligned128StoreEvandro Menezes2017-01-101-8/+50
| | | | | | | | | | | | The original code considered only v2i64 as slow for this feature. This patch consider all 128-bit long vector types as slow candidates. In internal tests, extending this feature to all 128-bit vector types resulted in an overall improvement of 1% on Exynos M1. Differential revision: https://reviews.llvm.org/D27998 llvm-svn: 291616
* [AArch64] Reduce vector insert/extract cost for Falkor.Chad Rosier2017-01-061-0/+26
| | | | | | Differential Revision: https://reviews.llvm.org/D28403 llvm-svn: 291254
* [AArch64][CostModel] Add coverage for bswap intrinsics.Chad Rosier2017-01-051-0/+70
| | | | llvm-svn: 291140
* [AArch64] Remove mcpu option as this test is not target specific. NFC.Chad Rosier2017-01-051-1/+1
| | | | llvm-svn: 291117
* [AArch64] Remove unused arguments from tests. NFC.Chad Rosier2017-01-051-32/+32
| | | | llvm-svn: 291112
* [AArch64] Guard Misaligned 128-bit store penalty by subtarget featureMatthew Simpson2016-12-151-6/+12
| | | | | | | | | This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing such stores in getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D27677 llvm-svn: 289845
* [AArch64] Correct the check of signed 9-bit imm in isLegalAddressingMode()Haicheng Wu2016-12-071-1/+97
| | | | | | | | In the addressing mode, signed 9-bit imm is [-256, 255], not [-512, 511]. Differential Revision: https://reviews.llvm.org/D27480 llvm-svn: 288876
* [TTI/CostModel] Correct the way getGEPCost() calls isLegalAddressingMode()Haicheng Wu2016-12-031-0/+196
| | | | | | | | Fix a bug when we call isLegalAddressingMode() from getGEPCost(). Differential Revision: https://reviews.llvm.org/D27357 llvm-svn: 288569
* [AArch64] Reduce vector insert/extract cost for KryoMatthew Simpson2016-02-181-0/+26
| | | | | | Differential Revision: http://reviews.llvm.org/D17379 llvm-svn: 261237
* [CostModel][AArch64] Remove amortization factor for some of the vector ↵Silviu Baranga2015-09-091-6/+6
| | | | | | | | | | | | | | | | | select instructions Summary: We are not scalarizing the wide selects in codegen for i16 and i32 and therefore we can remove the amortization factor. We still have issues with i64 vectors in codegen though. Reviewers: mcrosier Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12724 llvm-svn: 247156
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* Reduce verbiage of lit.local.cfg filesAlp Toker2014-06-091-2/+1
| | | | | | We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496
* AArch64/ARM64: move ARM64 into AArch64's placeTim Northover2014-05-243-0/+63
This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577
OpenPOWER on IntegriCloud