summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/SLPVectorizer/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [SLPVectorizer] Add opcode parameter to reorderAltShuffleOperands, ↵Dinar Temirbulatov2017-08-191-0/+42
| | | | | | | | | | | | reorderInputsAccordingToOpcode functions. Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D36766 llvm-svn: 311221
* [SLPVectorizer] Schedule bundle with different opcodes.Dinar Temirbulatov2017-08-141-0/+53
| | | | | | | | | | | | This change let us schedule a bundle with different opcodes in it, for example : [ load, add, add, add ] Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D36518 llvm-svn: 310847
* [SLP] General improvements of SLP vectorization process.Alexey Bataev2017-08-072-49/+55
| | | | | | | | | | | | | | | | | | Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310260
* Revert "[SLP] General improvements of SLP vectorization process."Alexey Bataev2017-08-072-55/+49
| | | | | | This reverts commit r310255. llvm-svn: 310257
* [SLP] General improvements of SLP vectorization process.Alexey Bataev2017-08-072-49/+55
| | | | | | | | | | | | | | | Summary: Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310255
* [SLPVectorizer][X86] Cleanup test case. NFCISimon Pilgrim2017-08-061-256/+249
| | | | | | Remove excess attributes/metadata llvm-svn: 310227
* [SLPVectorizer] Add extra parameter to setInsertPointAfterBundle to handle ↵Dinar Temirbulatov2017-08-051-0/+708
| | | | | | | | different opcodes, NFCI. Differential Revision: https://reviews.llvm.org/D35769 llvm-svn: 310183
* [InstCombine] Support sext in foldLogicCastConstantCraig Topper2017-08-021-4/+4
| | | | | | | | This adds support for sext in foldLogicCastConstant. This is a prerequisite for D36214. Differential Revision: https://reviews.llvm.org/D36234 llvm-svn: 309880
* [SLP] Fix for PR31880: shuffle and vectorize repeated scalar ops on ↵Alexey Bataev2017-08-021-65/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | extracted elements Summary: Currently most of the time vectors of extractelement instructions are treated as scalars that must be gathered into vectors. But in some cases, like when we have extractelement instructions from single vector with different constant indeces or from 2 vectors of the same size, we can treat this operations as shuffle of a single vector or blending of 2 vectors. ``` define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) { %x0 = extractelement <2 x i8> %x, i32 0 %y1 = extractelement <2 x i8> %y, i32 1 %x0x0 = mul i8 %x0, %x0 %y1y1 = mul i8 %y1, %y1 %ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0 %ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1 ret <2 x i8> %ins2 } ``` can be converted to something like ``` define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) { %1 = shufflevector <2 x i8> %x, <2 x i8> %y, <2 x i32> <i32 0, i32 3> %2 = mul <2 x i8> %1, %1 ret <2 x i8> %2 } ``` Currently this type of conversion is considered as high cost transformation. Reviewers: mzolotukhin, delena, mkuper, hfinkel, RKSimon Subscribers: ashahid, RKSimon, spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D30200 llvm-svn: 309812
* [SLP]: Add test to resurrect the jumbled load patch. This test has multiple usesMohammad Shahid2017-07-311-0/+48
| | | | | | | of memory loads by different user Change-Id: I40b5ba8b810265440f3e55efca77c4b41ca98fa4 llvm-svn: 309544
* [SLP] A test for limiting vectorization of instructions, NFC.Alexey Bataev2017-06-301-0/+70
| | | | llvm-svn: 306828
* [X86][SLM] Add SLM arithmetic vectorization testsSimon Pilgrim2017-06-104-37/+333
| | | | | | As discussed on D33983, as SLM has so many custom costs its worth testing as well. llvm-svn: 305151
* Regenerate testSimon Pilgrim2017-06-081-24/+24
| | | | llvm-svn: 304973
* [SLP] Change extension of the test, NFC.Alexey Bataev2017-06-061-0/+0
| | | | llvm-svn: 304829
* [SLP] Add a test for fix of PR32164, NFC.Alexey Bataev2017-06-061-0/+138
| | | | llvm-svn: 304826
* [SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 ↵Simon Pilgrim2017-05-153-0/+1998
| | | | | | add/sub/mul llvm-svn: 303074
* [SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 shiftsSimon Pilgrim2017-05-153-0/+2589
| | | | llvm-svn: 303069
* Replace hardcoded intrinsic list with speculatable attribute.Matt Arsenault2017-05-031-1/+1
| | | | | | No change in which intrinsics should be speculated. llvm-svn: 301995
* [SLP vectorizer] Allow phi node reordering in tryToVectorizeList.Easwaran Raman2017-04-181-0/+54
| | | | | | | | | | | | | | | | | In tryToVectorizeList, under a very limited circumstance (when entered from tryToVectorizePair), the values may be reordered (swapped) and the SLP tree is built with the new order. This extends that to the case when starting from phis in vectorizeChainsInBlock when there are exactly two phis. The textual order of phi nodes shouldn't really matter. Without this change, the loop body in the accompnaying test case is fully vectorized when we swap the orde of the phis but not with this order. While this doesn't solve the phi-ordering problem in a general way (for more than 2 phis), this is simple fix that piggybacks on an existing mechanism and is useful in cases like multiplying two complex numbers. Differential revision: https://reviews.llvm.org/D32065 llvm-svn: 300574
* [X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalarsSimon Pilgrim2017-03-151-369/+51
| | | | | | Prep work for PR31810 llvm-svn: 297876
* [SLP] Revert everything that has to do with memory access sorting.Michael Kuperstein2017-03-104-196/+55
| | | | | | | | | | | | This reverts r293386, r294027, r294029 and r296411. Turns out the SLP tree isn't actually a "tree" and we don't handle accessing the same packet of loads in several different orders well, causing miscompiles. Revert until we can fix this properly. llvm-svn: 297493
* [SLP] Revert r296863 due to miscompiles.Michael Kuperstein2017-03-062-44/+1
| | | | | | Details and reproducer are on the email thread for r296863. llvm-svn: 297103
* [SLP] A test for vectorization of users of extractelement instructions,Alexey Bataev2017-03-061-3/+38
| | | | | | NFC. llvm-svn: 297024
* [SLP] Fixes the bug due to absence of in order uses of scalars which needs ↵Mohammad Shahid2017-03-032-1/+44
| | | | | | | | | | | | | | | | to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for external use of vectorizable scalars based on shuffle mask. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d llvm-svn: 296863
* Revert r296575 "[SLP] Fixes the bug due to absence of in order uses of ↵Hans Wennborg2017-03-012-45/+1
| | | | | | | | scalars which needs to be available" It caused miscompiles, e.g. in Chromium (PR32109). llvm-svn: 296654
* [SLP] Preserve IR flags when vectorizing horizontal reductions.Alexey Bataev2017-03-012-16/+16
| | | | | | | | | | | | | | | | | | Summary: The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar horizontal reductions instructions into vectors, just like for other vectorized instructions. It doe not include IR propagation for extra arguments, we need to handle original scalar operations for extra args to propagate correct flags. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30418 llvm-svn: 296614
* [SLP] Preserve IR flags for extra args.Alexey Bataev2017-03-012-5/+5
| | | | | | | | | | | | | | | Summary: We should preserve IR flags for extra args. These IR flags should be taken from original scalar operations, not from the reduction operations. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30447 llvm-svn: 296613
* [SLP] Fix for PR32038: extra add of PHI node when it is not required.Alexey Bataev2017-03-012-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If horizontal reduction tree starts from the binary operation that is used in PHI node, but this PHI is not used in horizontal reduction, we may end up with extra addition of this PHI node after vectorization. Here is an example: ``` %phi = phi i32 [ %tmp, %end], ... ... %tmp = add i32 %tmp1, %tmp2 end: ``` after vectorization we always have something like: ``` %phi = phi i32 [ %tmp, %end], ... ... %red = extractelement <8 x 32> %vec.red, 0 %tmp = add i32 %red, %phi end: ``` even if `%phi` is not used in reduction tree. Patch considers these PHI nodes as extra arguments and considers them in the final result iff they really used in reduction. Reviewers: mkuper, hfinkel, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30409 llvm-svn: 296606
* [SLP] Fixes the bug due to absence of in order uses of scalars which needs ↵Mohammad Shahid2017-03-012-1/+45
| | | | | | | | | | | | | | | to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d llvm-svn: 296575
* [SLP] Load sorting should not try to sort things that aren't loads.Michael Kuperstein2017-02-271-0/+43
| | | | | | | We may get a VL where the first element is a load, but the others aren't. Trying to sort such VLs can only lead to sorrow. llvm-svn: 296411
* [SLP] Use different flags in tests for reduction ops and extra args.Alexey Bataev2017-02-271-3/+3
| | | | llvm-svn: 296376
* [SLP] Modify test to check IR flags propagation for extra args.Alexey Bataev2017-02-271-15/+15
| | | | llvm-svn: 296369
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295972
* Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"Alexey Bataev2017-02-231-6/+4
| | | | | | This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84. llvm-svn: 295957
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295956
* Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"Alexey Bataev2017-02-231-6/+4
| | | | | | This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d. llvm-svn: 295951
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295949
* Revert r295868 because it breaks a different SLP lit test.Michael Kuperstein2017-02-221-6/+4
| | | | llvm-svn: 295906
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong resultAlexey Bataev2017-02-221-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295868
* [SLP] Test with extra argument used several times.Alexey Bataev2017-02-221-0/+108
| | | | llvm-svn: 295853
* [SLP] Tests for shuffle/blending operations.Alexey Bataev2017-02-211-0/+167
| | | | llvm-svn: 295717
* [SLP] Additional test for vectorization of cal/invoke args vectorizationAlexey Bataev2017-02-201-0/+95
| | | | llvm-svn: 295657
* [SLP] Fix for PR31879: vectorize repeated scalar ops that don't get putAlexey Bataev2017-02-141-16/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | back into a vector Previously the cost of the existing ExtractElement/ExtractValue instructions was considered as a dead cost only if it was detected that they have only one use. But these instructions may be considered dead also if users of the instructions are also going to be vectorized, like: ``` %x0 = extractelement <2 x float> %x, i32 0 %x1 = extractelement <2 x float> %x, i32 1 %x0x0 = fmul float %x0, %x0 %x1x1 = fmul float %x1, %x1 %add = fadd float %x0x0, %x1x1 ``` This can be transformed to ``` %1 = fmul <2 x float> %x, %x %2 = extractelement <2 x float> %1, i32 0 %3 = extractelement <2 x float> %1, i32 1 %add = fadd float %2, %3 ``` because though `%x0` and `%x1` have 2 users each other, these users are part of the vectorized tree and we can consider these `extractelement` instructions as dead. Differential Revision: https://reviews.llvm.org/D29900 llvm-svn: 295056
* [SLP] Additional tests for extractelement cost fix.Alexey Bataev2017-02-141-0/+39
| | | | llvm-svn: 295050
* [SLP] Test for extractelement cost fix.Alexey Bataev2017-02-131-0/+20
| | | | llvm-svn: 294980
* [SLP] Fix for PR31690: Allow using of extra values in horizontalAlexey Bataev2017-02-131-304/+278
| | | | | | | | | | | | | | | | | | | | | | | reductions. Currently, LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also, it supports a vectorization of instructions with some extra arguments, like: ``` float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } ``` Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D29727 llvm-svn: 294934
* [SLP] Additional test to check correct work of horizontal reductions,Alexey Bataev2017-02-081-0/+576
| | | | | | NFC. llvm-svn: 294505
* [SLP] Revert "Allow using of extra values in horizontal reductions."Michael Kuperstein2017-02-061-114/+134
| | | | | | | | This breaks when one of the extra values is also a scalar that participates in the same vectorization tree which we'll end up reducing. llvm-svn: 294245
* [SLP] Use SCEV to sort memory accesses.Michael Kuperstein2017-02-031-7/+120
| | | | | | | | | | This generalizes memory access sorting to use differences between SCEVs, instead of relying on constant offsets. That allows us to properly do SLP vectorization of non-sequentially ordered loads within loops bodies. Differential Revision: https://reviews.llvm.org/D29425 llvm-svn: 294027
* [SLP] Fix for PR31690: Allow using of extra values in horizontal reductions.Alexey Bataev2017-02-031-134/+114
| | | | | | | | | | | | | | | | | | | | | Currently LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also it supports a vectorization of instructions with some extra arguments, like: float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D28961 llvm-svn: 293994
OpenPOWER on IntegriCloud