summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/SLPVectorizer/X86
Commit message (Collapse)AuthorAgeFilesLines
* [SLPVectorizer] Do not assume extracelement idx is a ConstantInt.Florian Hahn2020-02-191-0/+150
| | | | | | | | | | | | | | | | | | The index of an ExtractElementInst is not guaranteed to be a ConstantInt. It can be any integer value. Check explicitly for ConstantInts. The new test cases illustrate scenarios where we crash without this patch. I've also added another test case to check the matching of extractelement vector ops works. Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D74758 (cherry picked from commit e32522ca178acc42e26f21d64ef8fc180ad772bd)
* [SLP] Don't allow Div/Rem as alternate opcodesAndrei Elovikov2020-01-231-34/+32
| | | | | | | | | | | | | | | | | | Summary: We don't have control/verify what will be the RHS of the division, so it might happen to be zero, causing UB. Reviewers: Vasilis, RKSimon, ABataev Reviewed By: ABataev Subscribers: vporpo, ABataev, hiraditya, llvm-commits, vdmitrie Tags: #llvm Differential Revision: https://reviews.llvm.org/D72740 (cherry picked from commit e1d6d368529322edc658c893c01eaadaf8053ea6)
* [SLP] Add a test showing miscompilation in AltOpcode supportAndrei Elovikov2020-01-231-0/+131
| | | | | | | | | | | | | | Reviewers: Vasilis, RKSimon, ABataev Reviewed By: RKSimon, ABataev Subscribers: ABataev, inglorion, dexonsmith, llvm-commits, vdmitrie Tags: #llvm Differential Revision: https://reviews.llvm.org/D72739 (cherry picked from commit 757fe53994c1792cbdc84526696a0e256345911f)
* [NFC] Fix trivial typos in commentsJames Henderson2020-01-061-1/+1
| | | | | | | | Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.
* Migrate function attribute "no-frame-pointer-elim"="false" to ↵Fangrui Song2019-12-244-4/+4
| | | | "frame-pointer"="none" as cleanups after D56351
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-244-4/+4
| | | | as cleanups after D56351
* [SLP]Fix test arguments, NFC.Alexey Bataev2019-12-191-9/+5
|
* [SLP]Added test for gathering reused extracts from narrow vector, NFC.Alexey Bataev2019-12-191-0/+71
|
* [SLP] Enhance SLPVectorizer to vectorize different combinations of aggregatesAnton Afanasyev2019-12-031-48/+41
| | | | | | | | | | | | | | | | | | | | Summary: Make SLPVectorize to recognize homogeneous aggregates like `{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`, `[2 x {float, float}]` and so on. It's a follow-up of https://reviews.llvm.org/D70068. Merged `findBuildVector()` and `findBuildAggregate()` to one `findBuildAggregate()` function making it recursive to recognize multidimensional aggregates. Aggregates required to be homogeneous. Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70587
* [x86] make SLM extract vector element more expensive than defaultSanjay Patel2019-11-276-843/+1302
| | | | | | | | | | | | | | | | | | | I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
* [SLP] Enhance SLPVectorizer to vectorize vector aggregateAnton Afanasyev2019-11-221-14/+11
| | | | | | | | | | | | | | | | | Summary: Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`. This patch allows `findBuildAggregate()` to consider vector aggregates as well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`. Fixes vector part of llvm.org/PR42022 Reviewers: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70068
* [SLP][Test] Precommit tests for D70068 and D70587. NFC.Anton Afanasyev2019-11-221-0/+288
|
* Temporarily Revert "[SLP] allow forming 2-way reduction patterns" and update ↵Eric Christopher2019-11-202-19/+19
| | | | | | | | | | testcases. After speaking with Sanjay - seeing a number of miscompiles and working on tracking down a testcase. None of the follow on patches seem to have helped so far. This reverts commit 8a0aa5310bccbb42d16d11db090419fcefdd1376.
* Temporarily Revert "Temporarily Revert "[SLP] allow forming 2-way reduction ↵Eric Christopher2019-11-201-10/+9
| | | | | | | | patterns"" as there were testcase changes after that need to also be reverted. This reverts commit cd8748a15f2d18861b3548eb26ed2b52e5ee50b4.
* Temporarily Revert "[SLP] allow forming 2-way reduction patterns"Eric Christopher2019-11-201-9/+10
| | | | | | | | After speaking with Sanjay - seeing a number of miscompiles and working on tracking down a testcase. None of the follow on patches seem to have helped so far. This reverts commit 7ff57705ba196ce649d6034614b3b9df57e1f84f.
* [SLP] reduce duplicate CHECK lines in tests; NFCSanjay Patel2019-11-201-392/+208
|
* [SLP] fix miscompile on min/max reductions with extra uses (PR43948) (2nd try)Sanjay Patel2019-11-192-3/+3
| | | | | | | | | | | | | | | | | | | | | | The 1st attempt was reverted because it revealed an existing bug where we could produce invalid IR (use of value before definition). That should be fixed with: rG39de82ecc9c2 The bug manifests as replacing a reduction operand with an undef value. The problem appears to be limited to cases where a min/max reduction has extra uses of the compare operand to the select. In the general case, we are tracking "ExternallyUsedValues" and an "IgnoreList" of the reduction operations, but those may not apply to the final compare+select in a min/max reduction. For that, we use replaceAllUsesWith (RAUW) to ensure that the new vectorized reduction values are transferred to all subsequent users. Differential Revision: https://reviews.llvm.org/D70148
* [SLP] fix insertion point for min/max reductionSanjay Patel2019-11-191-1/+1
| | | | | | As discussed in D70148 (and caused a revert of the original commit): if we insert at the select, then we can produce invalid IR because the replacement for the compare may have uses before the select.
* [SLP] add test for reduction miscompile; NFCSanjay Patel2019-11-191-0/+30
| | | | See D70148 for discussion.
* Temporarily revert "[SLP] fix miscompile on min/max reductions with extra ↵Eric Christopher2019-11-182-2/+2
| | | | | | | | uses (PR43948)" as it causes an ICE on valid. A testcase was followed up on the original thread. This reverts commit a3e61946c5bd7bdfab15af76b292e52d6ffa27f7.
* [SLP] reduce duplicated check lines in tests; NFCSanjay Patel2019-11-181-313/+163
|
* Revert "Temporarily Revert:"Alexey Bataev2019-11-1420-407/+875
| | | | | | | | | | | | This reverts commit e511c4b0dff1692c267addf17dce3cebe8f97faa: Temporarily Revert: "[SLP] Generalization of stores vectorization." "[SLP] Fix -Wunused-variable. NFC" "[SLP] Vectorize jumbled stores." after fixing the problem with compile time.
* [SLP] fix miscompile on min/max reductions with extra uses (PR43948)Sanjay Patel2019-11-132-2/+2
| | | | | | | | | | | | | | | | | The bug manifests as replacing a reduction operand with an undef value. The problem appears to be limited to cases where a min/max reduction has extra uses of the compare operand to the select. In the general case, we are tracking "ExternallyUsedValues" and an "IgnoreList" of the reduction operations, but those may not apply to the final compare+select in a min/max reduction. For that, we use replaceAllUsesWith (RAUW) to ensure that the new vectorized reduction values are transferred to all subsequent users. Differential Revision: https://reviews.llvm.org/D70148
* [SLP] improve test readability; NFCSanjay Patel2019-11-131-6/+6
|
* [SLP] add test for miscompile with reduction (PR43948); NFCSanjay Patel2019-11-121-0/+55
|
* [SLP] Look-ahead operand reordering heuristic.Vasileios Porpodas2019-11-111-33/+223
| | | | | | | | | | | | | | Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples). Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk Reviewed By: RKSimon, dtemirbulatov Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60897
* [SLP] allow forming 2-way reduction patternsSanjay Patel2019-11-071-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | We have a vector compare reduction problem seen in PR39665 comment 2: https://bugs.llvm.org/show_bug.cgi?id=39665#c2 Or slightly reduced here: define i1 @cmp2(<2 x double> %a0) { %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0> %b = extractelement <2 x i1> %a, i32 0 %c = extractelement <2 x i1> %a, i32 1 %d = and i1 %b, %c ret i1 %d } SLP would not attempt to turn this into a vector reduction because there is an artificial lower limit on that transform. We can not completely remove that limit without inducing regressions though, so this patch just hacks an extra attempt at creating a 2-way reduction to the end of the analysis. As shown in the test file, we are still not getting some of the motivating cases, so follow-on patches will be needed to solve those cases. Differential Revision: https://reviews.llvm.org/D59710
* Temporarily Revert:Eric Christopher2019-11-0620-875/+407
| | | | | | | | | | | | | | | | | "[SLP] Generalization of stores vectorization." "[SLP] Fix -Wunused-variable. NFC" "[SLP] Vectorize jumbled stores." As they're causing significant (10-30x) compile time regressions on vectorizable code. The primary cause of the compile-time regression is f228b5371647f471853c5fb3e6719823a42fe451. This reverts commits: f228b5371647f471853c5fb3e6719823a42fe451 5503455ccb3f5fcedced158332c016c8d3a7fa81 21d498c9c0f32dcab5bc89ac593aa813b533b43a
* [SLP] add tests for 2-wide reductions; NFCSanjay Patel2019-11-051-2/+115
|
* [SLP]Fix PR43799: Crash on different sizes of GEP indices.Alexey Bataev2019-11-041-0/+23
| | | | | | | | | | | | | | | Summary: If the GEP instructions are going to be vectorized, the indices in those GEP instructions must be of the same type. Otherwise, the compiler may crash when trying to build the vector constant. Reviewers: RKSimon, spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69627
* [SLP] avoid 'tmp' value name conflict with auto-generated CHECK script; NFCSanjay Patel2019-11-011-23/+23
| | | | | | | | | The script uses 'TMP#' as its substitute for nameless values, so if a test already contains 'tmp#' *named* values, then there could be trouble. We should probably just fix the script to avoid this problem going forward, but it's easy enough to change a test too (and explicitly naming variables 'tmp' is always a sad choice).
* [SLP] avoid 'tmp' value name conflict with auto-generated CHECK script; NFCSanjay Patel2019-11-011-50/+50
| | | | | | | | | The script uses 'TMP#' as its substitute for nameless values, so if a test already contains 'tmp#' *named* values, then there could be trouble. We should probably just fix the script to avoid this problem going forward, but it's easy enough to change a test too (and explicitly naming variables 'tmp' is always a sad choice).
* [SLP] avoid 'tmp' value name conflict with auto-generated CHECK script; NFCSanjay Patel2019-11-011-88/+87
| | | | | | | | | The script uses 'TMP#' as its substitute for nameless values, so if a test already contains 'tmp#' *named* values, then there could be trouble. We should probably just fix the script to avoid this problem going forward, but it's easy enough to change a test too (and explicitly naming variables 'tmp' is always a sad choice).
* [SLP] Vectorize jumbled stores.Alexey Bataev2019-10-313-9/+111
| | | | | | | | | | | | | Summary: Patch adds support for vectorization of the jumbled stores. The value operands are vectorized and then shuffled in the right order before store. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43339
* Revert "[SLP] Vectorize jumbled stores."Haojian Wu2019-10-312-7/+9
| | | | | | This reverts commit 21d498c9c0f32dcab5bc89ac593aa813b533b43a. This commit causes some crashes on some targets.
* [SLP] Vectorize jumbled stores.Alexey Bataev2019-10-302-9/+7
| | | | | | | | | | | | | Summary: Patch adds support for vectorization of the jumbled stores. The value operands are vectorized and then shuffled in the right order before store. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43339
* [SLP] Generalization of stores vectorization.Alexey Bataev2019-10-2918-403/+769
| | | | | | | | | Stores are vectorized with maximum vectorization factor of 16. Patch tries to improve the situation and use maximal vectorization factor. Reviewers: spatel, RKSimon, mkuper, hfinkel Differential Revision: https://reviews.llvm.org/D43582
* [SLP] avoid reduction transform on patterns that the backend can ↵Sanjay Patel2019-10-161-52/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | load-combine (2nd try) The 1st attempt at this modified the cost model in a bad way to avoid the vectorization, but that caused problems for other users (the loop vectorizer) of the cost model. I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a cost-independent bailout with a conservative pattern match for a multi-instruction sequence that can probably be reduced later. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 375025
* [CostModel][X86] Add CTLZ scalar costsSimon Pilgrim2019-10-141-142/+160
| | | | | | | | Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786
* [CostModel][X86] Add CTPOP scalar costs (PR43656)Simon Pilgrim2019-10-141-22/+46
| | | | | | | | Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775
* [CostModel][X86] Improve sum reduction costs.Simon Pilgrim2019-10-121-1/+1
| | | | | | | | I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. llvm-svn: 374655
* [SLP] respect target register width for GEP vectorization (PR43578)Sanjay Patel2019-10-091-13/+18
| | | | | | | | | | | | | | | | | | | | | We failed to account for the target register width (max vector factor) when vectorizing starting from GEPs. This causes vectorization to proceed to obviously illegal widths as in: https://bugs.llvm.org/show_bug.cgi?id=43578 For x86, this also means that SLP can produce rogue AVX or AVX512 code even when the user specifies a narrower vector width. The AArch64 test in ext-trunc.ll appears to be better using the narrower width. I'm not exactly sure what getelementptr.ll is trying to do, but it's testing with "-slp-threshold=-18", so I'm not worried about those diffs. The x86 test is an over-reduction from SPEC h264; this patch appears to restore the perf loss caused by SLP when using -march=haswell. Differential Revision: https://reviews.llvm.org/D68667 llvm-svn: 374183
* [SLP] add test with prefer-vector-width function attribute; NFC (PR43578)Sanjay Patel2019-10-081-0/+59
| | | | llvm-svn: 374090
* [SLP] add test with prefer-vector-width function attribute; NFCSanjay Patel2019-10-081-31/+73
| | | | llvm-svn: 374039
* Revert "[SLP] avoid reduction transform on patterns that the backend can ↵Martin Storsjo2019-10-071-104/+52
| | | | | | | | | | load-combine" This reverts SVN r373833, as it caused a failed assert "Non-zero loop cost expected" on building numerous projects, see PR43582 for details and reproduction samples. llvm-svn: 373882
* [SLP] avoid reduction transform on patterns that the backend can load-combineSanjay Patel2019-10-051-52/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a scalar cost model adjustment with a conservative pattern match and cost summation for a multi-instruction sequence that can probably be reduced later. This should prevent SLP from creating a vector reduction unless that sequence is extremely cheap. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 373833
* [SLP] add test for vectorization of different widths (PR28457); NFCSanjay Patel2019-10-021-0/+105
| | | | llvm-svn: 373483
* [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && ↵Alexey Bataev2019-09-2918-1175/+795
| | | | | | | | | | | | | | | | | | | | "SCEVAddRecExpr operand is not loop-invariant!") Initially SLP vectorizer replaced all going-to-be-vectorized instructions with Undef values. It may break ScalarEvaluation and may cause a crash. Reworked SLP vectorizer so that it does not replace vectorized instructions by UndefValue anymore. Instead vectorized instructions are marked for deletion inside if BoUpSLP class and deleted upon class destruction. Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29641 llvm-svn: 373166
* [SLPVectorizer][X86] Regenerate arith-fp testsSimon Pilgrim2019-09-271-0/+40
| | | | llvm-svn: 373063
* Revert [SLP] Fix for PR31847: Assertion failed: ↵Jordan Rupprecht2019-09-2617-266/+1175
| | | | | | | | (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") This reverts r372626 (git commit 6a278d9073bdc158d31d4f4b15bbe34238f22c18) llvm-svn: 373019
OpenPOWER on IntegriCloud