summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/SLPVectorizer
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "[SLPVectorizer] Add initial alternate opcode support for cast ↵Martin Storsjo2018-07-121-154/+76
| | | | | | | | | instructions. (REAPPLIED)" This reverts commit r336812, which broke compilation of a number of projects, see PR38154. llvm-svn: 336949
* [SLPVectorizer] Add initial alternate opcode support for cast instructions. ↵Simon Pilgrim2018-07-111-76/+154
| | | | | | | | | | | | | | | | | | | (REAPPLIED) We currently only support binary instructions in the alternate opcode shuffles. This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism: 1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly. 2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this. 3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc. 4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements. Reapplied with fix to only accept 2 different casts if they come from the same source type. Differential Revision: https://reviews.llvm.org/D49135 llvm-svn: 336812
* [SLPVectorizer] Ensure alternate/passthrough doesn't vectorize sdiv with ↵Simon Pilgrim2018-07-111-0/+81
| | | | | | undef elts llvm-svn: 336809
* [SLPVectorizer] Add some additional alternate cast testsSimon Pilgrim2018-07-111-0/+107
| | | | | | Initial attempt at D49135 failed as we weren't correctly handling casts with different source types. llvm-svn: 336808
* Revert rL336804: [SLPVectorizer] Add initial alternate opcode support for ↵Simon Pilgrim2018-07-111-151/+52
| | | | | | | | cast instructions. Reverting due to buildbot failures llvm-svn: 336806
* [SLPVectorizer] Add initial alternate opcode support for cast instructions.Simon Pilgrim2018-07-111-52/+151
| | | | | | | | | | | | | | | We currently only support binary instructions in the alternate opcode shuffles. This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism: 1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly. 2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this. 3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc. 4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements. Differential Revision: https://reviews.llvm.org/D49135 llvm-svn: 336804
* [SLP] Recognize min/max pattern using instructions producing same values.Farhana Aleen2018-07-021-65/+55
| | | | | | | | | | | | | | | | | | | Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization. %1 = extractelement <2 x i32> %a, i32 0 %2 = extractelement <2 x i32> %a, i32 1 %cond = icmp sgt i32 %1, %2 %3 = extractelement <2 x i32> %a, i32 0 %4 = extractelement <2 x i32> %a, i32 1 %select = select i1 %cond, i32 %3, i32 %4 Author: FarhanaAleen Reviewed By: ABataev, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D47608 llvm-svn: 336130
* [SLPVectorizer][X86] Begin adding alternate tests for call operatorsSimon Pilgrim2018-07-021-0/+65
| | | | | | Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls llvm-svn: 336125
* [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct ↵Simon Pilgrim2018-07-021-3/+24
| | | | | | | | | | handle SK_Select patterns. We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case. This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now... llvm-svn: 336095
* [SLPVectorizer][X86] Add some alternate tests for cast operatorsSimon Pilgrim2018-07-011-0/+169
| | | | | | Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations llvm-svn: 336060
* [SLPVectorizer] Recognise non uniform power of 2 constantsSimon Pilgrim2018-06-261-32/+53
| | | | | | | | | | Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them. As SLP works with arrays of values I don't think we can easily use the pattern match helpers here. Differential Revision: https://reviews.llvm.org/D48214 llvm-svn: 335621
* [SLPVectorizer] Support alternate opcodes in tryToVectorizeListSimon Pilgrim2018-06-224-503/+261
| | | | | | | | | | Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree. NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc. Differential Revision: https://reviews.llvm.org/D48488 llvm-svn: 335364
* [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pairSimon Pilgrim2018-06-221-19/+15
| | | | | | | | | | SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle. This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle. Differential Revision: https://reviews.llvm.org/D48477 llvm-svn: 335349
* [SLPVectorizer][X86] Add alternate opcode tests for simple build vector casesSimon Pilgrim2018-06-222-0/+840
| | | | llvm-svn: 335348
* [CostModel][AArch64] Add some initial costs for SK_Select and ↵Simon Pilgrim2018-06-221-83/+50
| | | | | | | | | | | | | | SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329
* [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)Simon Pilgrim2018-06-212-60/+20
| | | | | | These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216
* [SLPVectorizer][X86] Add horizontal add/sub testsSimon Pilgrim2018-06-212-0/+1094
| | | | | | Shows PR37882 perf regression llvm-svn: 335215
* [SLPVectorizer] Relax "alternate" opcode vectorisation to work with any ↵Simon Pilgrim2018-06-202-34/+36
| | | | | | | | | | | | | | SK_Select shuffle pattern D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand. This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation. The AArch64 regression will be fixed by D48172. Differential Revision: https://reviews.llvm.org/D48174 llvm-svn: 335130
* [SLP][X86] Add AVX2 run to POW2 SDIV TestsSimon Pilgrim2018-06-151-1/+2
| | | | | | Non-uniform pow2 tests are only make sense on targets with fast (low cost) non-uniform shifts llvm-svn: 334821
* [SLP][X86] Regenerate POW2 SDIV TestsSimon Pilgrim2018-06-151-9/+91
| | | | | | Added non-uniform pow2 test as well llvm-svn: 334819
* [SLP] Add testcases of min/max reduction pattern for AMDGPU.Farhana Aleen2018-06-111-0/+260
| | | | | Author: FarhanaAleen llvm-svn: 334435
* AMDGPU: Make v2i16/v2f16 legal on VIMatt Arsenault2018-05-221-26/+17
| | | | | | | | | | | | This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953
* [AMDGPU] Support horizontal vectorization of min/max.Farhana Aleen2018-05-091-16/+392
| | | | | | | | | | | | Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920
* [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.Shiva Chen2018-05-093-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841
* [AMDGPU] Support horizontal vectorization.Farhana Aleen2018-05-011-0/+346
| | | | | | | | | | | | Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313
* [SLP] Add additional test for transposable binary operations with reuseMatthew Simpson2018-05-011-2/+44
| | | | llvm-svn: 331274
* [SLPVectorizer] Debug info shouldn't impact spill cost computation.Davide Italiano2018-04-301-0/+93
| | | | | | | | | | <rdar://problem/39794738> (Also, PR32761). Differential Revision: https://reviews.llvm.org/D46199 llvm-svn: 331199
* [NVPTX] Turn on Loop/SLP vectorizationBenjamin Kramer2018-04-272-0/+42
| | | | | | | | | | | | | | | | | | | | Since PTX has grown a <2 x half> datatype vectorization has become more important. The late LoadStoreVectorizer intentionally only does loads and stores, but now arithmetic has to be vectorized for optimal throughput too. This is still very limited, SLP vectorization happily creates <2 x half> if it's a legal type but there's still a lot of register moving happening to get that fed into a vectorized store. Overall it's a small performance win by reducing the amount of arithmetic instructions. I haven't really checked what the loop vectorizer does to PTX code, the cost model there might need some more tweaks. I didn't see it causing harm though. Differential Revision: https://reviews.llvm.org/D46130 llvm-svn: 331035
* [SLP] Add tests for transposable binary operationsMatthew Simpson2018-04-261-0/+292
| | | | | | | These test cases are vectorizable, but we are currently unable to vectorize them effectively. llvm-svn: 330945
* [X86] Remove unnecessary -mattr to enable avx512bw when the -mcpu already ↵Craig Topper2018-04-161-1/+1
| | | | | | | | enabled it. NFC This makes the test similar to the arith-sub.ll and arith-mul.ll tests. llvm-svn: 330144
* [SLP] Use getExtractWithExtendCost() to compute the scalar cost of ↵Haicheng Wu2018-04-161-16/+69
| | | | | | | | | | | | | extractelement/ext pair We use getExtractWithExtendCost to calculate the cost of extractelement and s|zext together when computing the extract cost after vectorization, but we calculate the cost of extractelement and s|zext separately when computing the scalar cost which is larger than it should be. Differential Revision: https://reviews.llvm.org/D45469 llvm-svn: 330143
* [SLP] update a test case. NFC.Haicheng Wu2018-04-111-15/+17
| | | | llvm-svn: 329818
* [SLP] Additional tests for reorder reuse vectorization, NFC.Alexey Bataev2018-04-091-0/+230
| | | | llvm-svn: 329603
* [SLPVectorizer][X86] Regenerate some tests. NFCISimon Pilgrim2018-04-045-74/+187
| | | | llvm-svn: 329196
* [SLP] Fix PR36481: vectorize reassociated instructions.Alexey Bataev2018-04-039-186/+126
| | | | | | | | | | | | | | | | | | Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Patch does not support reordering of the repeated instruction, this must be handled in the separate patch. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 329085
* [SLP] Added tests for checks of reordering of the repeated instructions,Alexey Bataev2018-04-031-0/+129
| | | | | | NFC. llvm-svn: 329080
* Revert "[SLP] Fix PR36481: vectorize reassociated instructions."Benjamin Kramer2018-04-038-122/+183
| | | | | | This reverts commit r328980 and r329046. Makes the vectorizer crash. llvm-svn: 329071
* [SLP] Distinguish "demanded and shrinkable" from "demanded and not ↵Haicheng Wu2018-04-032-19/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shrinkable" values when determining the minimum bitwidth We use two approaches for determining the minimum bitwidth. * Demanded bits * Value tracking If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded. But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in %i = sext i32 %e1 to i64 %gep = getelementptr inbounds i64, i64* %p, i64 %i are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in %tmp15 = sext i32 %tmp14 to i64 %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0 it doesn't make sense to shrink %tmp15 and we can skip the value tracking. Ideas are from Matthew Simpson! Differential Revision: https://reviews.llvm.org/D44868 llvm-svn: 329035
* [SLP] Fix PR36481: vectorize reassociated instructions.Alexey Bataev2018-04-028-183/+122
| | | | | | | | | | | | | | | Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 328980
* [SLPVectorizer] Add tests related to PR30787, NFCI.Dinar Temirbulatov2018-03-294-0/+390
| | | | llvm-svn: 328813
* [SLP] Add more checks to a test case. NFC.Haicheng Wu2018-03-261-0/+14
| | | | llvm-svn: 328572
* [SLP] Add a test case. NFC.Haicheng Wu2018-03-261-0/+38
| | | | llvm-svn: 328546
* [SLP] Stop counting cost of gather sequences with multiple usesMatthew Simpson2018-03-232-34/+21
| | | | | | | | | | | | | | | When building the SLP tree, we look for reuse among the vectorized tree entries. However, each gather sequence is represented by a unique tree entry, even though the sequence may be identical to another one. This means, for example, that a gather sequence with two uses will be counted twice when computing the cost of the tree. We should only count the cost of the definition of a gather sequence rather than its uses. During code generation, the redundant gather sequences are emitted, but we optimize them away with CSE. So it looks like this problem just affects the cost model. Differential Revision: https://reviews.llvm.org/D44742 llvm-svn: 328316
* [SLP] Add test case for a gather sequence with multiple usesMatthew Simpson2018-03-211-0/+66
| | | | llvm-svn: 328133
* [AArch64] Implement getArithmeticReductionCostMatthew Simpson2018-03-161-3/+3
| | | | | | | | | | This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702
* [SLP] Additional tests for stores vectorization, NFC.Alexey Bataev2018-03-051-0/+179
| | | | llvm-svn: 326740
* [SLP] Added new tests and updated existing for jumbled load, NFC.Mohammad Shahid2018-02-284-6/+468
| | | | llvm-svn: 326303
* [AArch64] add SLP test based on TSVC; NFCSanjay Patel2018-02-271-0/+127
| | | | | | | | This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326217
* [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)Simon Pilgrim2018-02-265-146/+109
| | | | | | | | | | Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133
* [SLP] Added new test + fixed some checks, NFC.Alexey Bataev2018-02-262-13/+175
| | | | llvm-svn: 326117
OpenPOWER on IntegriCloud