summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
...
* [InstCombine] Limit a vector demanded elts rule which was producing invalid IR.Philip Reames2019-04-301-0/+12
| | | | | | | | The demanded elts rules introduced for GEPs in https://reviews.llvm.org/rL356293 replaced vector constants with undefs (by design). It turns out that the LangRef disallows such cases when indexing structs. The right fix is probably to relax the langref requirement, and update other passes to expect the result, but for the moment, limit the transform to avoid compiler crashes. This should fix https://bugs.llvm.org/show_bug.cgi?id=41624. llvm-svn: 359633
* [InstCombine] reduce code duplication; NFCSanjay Patel2019-04-291-5/+7
| | | | | | | | | | Follow-up to: rL359482 Avoid this potential problem throughout by giving the type a name and verifying the assumption that both operands are the same type. llvm-svn: 359485
* [InstCombine] visitFCmpInst - appease copy+paste pattern warning. NFCI.Simon Pilgrim2019-04-291-1/+1
| | | | | | | | PVS Studio's copy+paste recognizer was seeing this as a typo, technically Op0/Op1 in a fcmp should always be the same type, but we might as well avoid the issue. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359482
* [InstCombine][X86] Tweak generic expansion of PACKSS/PACKUS to shuffle then ↵Simon Pilgrim2019-04-251-7/+4
| | | | | | | | truncate. NFCI. This has no effect on constant folding but will be useful when we expand non-saturating PACKSS/PACKUS intrinsics. llvm-svn: 359191
* Fix include order. NFCI.Simon Pilgrim2019-04-251-1/+1
| | | | llvm-svn: 359177
* Consolidate existing utilities for interpreting vector predicate maskes [NFC]Philip Reames2019-04-251-31/+1
| | | | llvm-svn: 359163
* [InstCombine] Be consistent w/handling of masked intrinsics style wise [NFC]Philip Reames2019-04-252-5/+6
| | | | llvm-svn: 359160
* [InstCombine][X86] Use generic expansion of PACKSS/PACKUS for constant ↵Simon Pilgrim2019-04-241-51/+45
| | | | | | | | | | folding. NFCI. This patch rewrites the existing PACKSS/PACKUS constant folding code to expand as a generic expansion. This is a first NFCI step toward expanding PACKSS/PACKUS intrinsics which are acting as non-saturating truncations (although technically the expansion could be used in all cases - but we'll probably want to be conservative). llvm-svn: 359111
* [InstCombine] Convert a masked.load of a dereferenceable address to an ↵Philip Reames2019-04-231-4/+14
| | | | | | | | | | unconditional load If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable. Differential Revision: https://reviews.llvm.org/D59703 llvm-svn: 359000
* [InstCombine] Eliminate stores to constant memoryPhilip Reames2019-04-222-0/+24
| | | | | | | | | | | | If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store. The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead. Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date. Differential Revision: https://reviews.llvm.org/D60659 llvm-svn: 358919
* [InstSimplify] Move masked.gather w/no active lanes handling to InstSimplify ↵Philip Reames2019-04-221-5/+0
| | | | | | | | from InstCombine In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask. llvm-svn: 358913
* [InstCombine] Factor out unreachable inst idiom creation [NFC]Philip Reames2019-04-173-13/+15
| | | | | | | | In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable. We can't actually insert the unreachable instruction since that would require changing the CFG. We leave that to simplifycfg later. This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places. llvm-svn: 358600
* [InstCombine] Prune fshl/fshr with masked operandsNikita Popov2019-04-161-0/+4
| | | | | | | | | | | | | | | | If a constant shift amount is used, then only some of the LHS/RHS operand bits are demanded and we may be able to simplify based on that. InstCombineSimplifyDemanded already had the necessary support for that, we just weren't calling it with fshl/fshr as root. In particular, this allows us to relax some masked funnel shifts into simple shifts, as shown in the tests. Patch by Shawn Landden. Differential Revision: https://reviews.llvm.org/D60660 llvm-svn: 358515
* [IR] Add WithOverflowInst classNikita Popov2019-04-161-45/+20
| | | | | | | | | | | | | | This adds a WithOverflowInst class with a few helper methods to get the underlying binop, signedness and nowrap type and makes use of it where sensible. There will be two more uses in D60650/D60656. The refactorings are all NFC, though I left some TODOs where things could be improved. In particular we have two places where add/sub are handled but mul isn't. Differential Revision: https://reviews.llvm.org/D60668 llvm-svn: 358512
* [PGO] Profile guided code size optimization.Hiroshi Yamauchi2019-04-153-8/+33
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Enable some of the existing size optimizations for cold code under PGO. A ~5% code size saving in big internal app under PGO. The way it gets BFI/PSI is discussed in the RFC thread http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html Note it doesn't currently touch loop passes. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59514 llvm-svn: 358422
* [InstCombine] canonicalize fdiv after fmul if reassociation is allowedSanjay Patel2019-04-151-0/+8
| | | | | | | | (X / Y) * Z --> (X * Z) / Y This can allow other optimizations/reassociations as shown in the test diffs. llvm-svn: 358404
* [InstCombine] Remove redundant/bogus mul_with_overflow combinesNikita Popov2019-04-131-8/+0
| | | | | | | | | | | | As pointed out in D60518 folding mulo(%x, undef) to {undef, undef} isn't correct. As a correct version of this already exists in InstructionSimplify (https://github.com/llvm-mirror/llvm/blob/bd8056ef326e075cc500f3f0cfcd1193bc200594/lib/Analysis/InstructionSimplify.cpp#L4750-L4757) this is just dead code though. Drop it together with the mul(%x, 0) -> {0, false} fold that is also already handled by InstSimplify. Differential Revision: https://reviews.llvm.org/D60649 llvm-svn: 358339
* [InstCombine] Canonicalize (-X srem Y) to -(X srem Y).Chen Zheng2019-04-131-0/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D60647 llvm-svn: 358328
* [InstCombine] Fix a nasty miscompile introduced w/masked.gather demanded eltsPhilip Reames2019-04-121-1/+5
| | | | | | | | This fixes a miscompile which was introduced in r356510 (https://reviews.llvm.org/D57372). The problem is that the original patch removed pointer operands where the load results we're demanded, but without considering the legality of the load itself. If the masked.gather had active, but undemanded, lanes, then we could end up creating a load which loaded from an undef address. The result could be a segfault, or, in theory, an arbitrary read from a random memory location into an used register. llvm-svn: 358299
* [InstCombine] Handle ssubo always overflowNikita Popov2019-04-101-3/+3
| | | | | | | | | Following D60483 and D60497, this adds support for AlwaysOverflows handling for ssubo. This is the last case we can handle right now. Differential Revision: https://reviews.llvm.org/D60518 llvm-svn: 358100
* [InstCombine] ssubo X, C -> saddo X, -CNikita Popov2019-04-101-0/+21
| | | | | | | | | | | | ssubo X, C is equivalent to saddo X, -C. Make the transformation in InstCombine and allow the logic implemented for saddo to fold prior usages of add nsw or sub nsw with constants. Patch by Dan Robertson. Differential Revision: https://reviews.llvm.org/D60061 llvm-svn: 358099
* [InstCombine] Handle saddo always overflowNikita Popov2019-04-101-3/+3
| | | | | | | | | Followup to D60483: Handle AlwaysOverflow conditions for saddo as well. Differential Revision: https://reviews.llvm.org/D60497 llvm-svn: 358095
* [InstCombine] Handle usubo always overflowNikita Popov2019-04-101-0/+3
| | | | | | | | | | | Check AlwaysOverflow condition for usubo. The implementation is the same as the existing handling for uaddo and umulo. Handling for saddo and ssubo will follow (smulo doesn't have the necessary ValueTracking support). Differential Revision: https://reviews.llvm.org/D60483 llvm-svn: 358052
* [InstCombine] Directly call computeOverflow methods in ↵Nikita Popov2019-04-101-6/+13
| | | | | | | | | OptimizeOverflowCheck; NFC Instead of using the willOverflow helpers. This makes it easier to extend handling of AlwaysOverflows. llvm-svn: 358051
* [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y).Chen Zheng2019-04-101-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D60395 llvm-svn: 358050
* Revert "[InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y)."Nikita Popov2019-04-091-6/+0
| | | | | | | | | | | This reverts commit 1383a9168948aabfd827220c9445ce0ce5765800. sdiv-canonicalize.ll fails after this revision. The fold needs to be moved outside the branch handling constant operands. However when this is done there are further test changes, so I'm reverting this in the meantime. llvm-svn: 358026
* [InstCombine] Restructure OptimizeOverflowCheck; NFCNikita Popov2019-04-091-31/+28
| | | | | | | | | Change the code to always handle the unsigned+signed cases together with the same basic structure for add/sub/mul. The simple folds are always handled first and then the ValueTracking overflow checks are used. llvm-svn: 358025
* [InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y).Chen Zheng2019-04-091-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D60395 llvm-svn: 358017
* [InstCombine] prevent possible miscompile with sdiv+negate of vector opSanjay Patel2019-04-091-10/+11
| | | | | | | | | | | Similar to: rL358005 Forego folding arbitrary vector constants to fix a possible miscompile bug. We can enhance the transform if we do want to handle the more complicated vector case. llvm-svn: 358013
* [InstCombine] prevent possible miscompile with negate+sdiv of vector opSanjay Patel2019-04-091-3/+6
| | | | | | | | | | | | | | // 0 - (X sdiv C) -> (X sdiv -C) provided the negation doesn't overflow. This fold has been around for many years and nobody noticed the potential vector miscompile from overflow until recently... So it seems unlikely that there's much demand for a vector sdiv optimization on arbitrary vector constants, so just limit the matching to splat constants to avoid the possible bug. Differential Revision: https://reviews.llvm.org/D60426 llvm-svn: 358005
* [InstCombine] peek through fdiv to find a squared sqrtSanjay Patel2019-04-081-0/+19
| | | | | | | | | | | | A more general canonicalization between fdiv and fmul would not handle this case because that would have to be limited by uses to prevent 2 values from becoming 3 values: (x/y) * (x/y) --> (x*x) / (y*y) (But we probably should still have that limited -- but more general -- canonicalization independently of this change.) llvm-svn: 357943
* [InstCombine] remove overzealous assert for shuffles (PR41419)Sanjay Patel2019-04-081-2/+2
| | | | | | | | | As the TODO indicates, instsimplify could be improved. Should fix: https://bugs.llvm.org/show_bug.cgi?id=41419 llvm-svn: 357910
* [InstCombine][X86] Expand MOVMSK to generic IR (PR39927)Simon Pilgrim2019-04-081-40/+14
| | | | | | | | | | | | | | | First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern: e.g. PMOVMSKB(v16i8 x): %cmp = icmp slt <16 x i8> %x, zeroinitializer %int = bitcast <16 x i8> %cmp to i16 %res = zext i16 %int to i32 Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW Differential Revision: https://reviews.llvm.org/D60256 llvm-svn: 357909
* [InstCombine] sdiv exact flag fixup.Chen Zheng2019-04-081-2/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D60396 llvm-svn: 357904
* [IR] Refactor attribute methods in Function class (NFC)Evandro Menezes2019-04-041-1/+1
| | | | | | | | Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731
* [InstCombine] Combine no-wrap sub and icmp w/ constant.Luqman Aden2019-04-041-1/+10
| | | | | | | | | | | | | | | | Teach InstCombine the transformation `(icmp P (sub nuw|nsw C2, Y), C) -> (icmp swap(P) Y, C2-C)` Reviewers: majnemer, apilipenko, sanjoy, spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: dmgreen, lebedev.ri, nikic, hiraditya, JDevlieghere, jfb, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59916 llvm-svn: 357674
* [InstCombine] Simplify ctpop with bitreverse/bswapDavid Bolvansky2019-04-031-0/+8
| | | | | | | | | | | | | | | | Summary: Fixes PR41337 Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60148 llvm-svn: 357564
* [InstCombine] Simplify ctlz/cttz with bitreverseDavid Bolvansky2019-04-021-1/+9
| | | | | | | | | | | | | | | | Summary: Fixes PR41273 Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60096 llvm-svn: 357521
* [InstCombine] Handle vector gep with scalar argument in ↵Mikael Holmen2019-04-011-1/+8
| | | | | | | | | | | | | | | | | | | | | | | evaluateInDifferentElementOrder Summary: This fixes PR41270. The recursive function evaluateInDifferentElementOrder expects to be called on a vector Value, so when we call it on a vector GEP's arguments, we must first check that the argument is indeed a vector. Reviewers: reames, spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60058 llvm-svn: 357389
* Revert "[InstCombine] Handle vector gep with scalar argument in ↵Mikael Holmen2019-04-011-8/+1
| | | | | | | | | | | evaluateInDifferentElementOrder" This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5. I'll recommit with a better commit message with reference to the phabricator review. llvm-svn: 357387
* [InstCombine] Handle vector gep with scalar argument in ↵Mikael Holmen2019-04-011-1/+8
| | | | | | | | | | | | evaluateInDifferentElementOrder This fixes PR41270. The recursive function evaluateInDifferentElementOrder expects to be called on a vector Value, so when we call it on a vector GEP's arguments, we must first check that the argument is indeed a vector. llvm-svn: 357385
* [InstCombine] eliminate commuted select-shuffles + binop (PR41304)Sanjay Patel2019-04-011-0/+24
| | | | | | | | | | | | | | | | | | | | | | | If we have a commutable vector binop with inverted select-shuffles, we don't care about the order of the operands in each vector lane: LHS = shuffle V1, V2, <0, 5, 6, 3> RHS = shuffle V2, V1, <0, 5, 6, 3> LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2 PR41304: https://bugs.llvm.org/show_bug.cgi?id=41304 ...is currently titled as an SLP enhancement, but at least for the given example, we can reduce that in instcombine because we are just eliminating shuffles. As noted in the TODO, this could be generalized, but I haven't thought through those patterns completely, so this is limited to what appears to be always safe. Differential Revision: https://reviews.llvm.org/D60048 llvm-svn: 357382
* [InstCombine] canonicalize select shuffles by commutingSanjay Patel2019-03-311-0/+9
| | | | | | | | | | | | | | | | | | | | In PR41304: https://bugs.llvm.org/show_bug.cgi?id=41304 ...we have a case where we want to fold a binop of select-shuffle (blended) values. Rather than try to match commuted variants of the pattern, we can canonicalize the shuffles and check for mask equality with commuted operands. We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a special case that the backend is required to handle because we already canonicalize vector select to this shuffle form. So there should be no codegen difference from this change. It's possible that this improves CSE in IR though. Differential Revision: https://reviews.llvm.org/D60016 llvm-svn: 357366
* [InstCombine] move shuffle canonicalizations before other transformsSanjay Patel2019-03-291-30/+27
| | | | | | | | | This may not be NFC, but I'm not sure how to expose any diffs in tests. In theory, it should be slightly more efficient and possibly more profitable to do the canonicalizations (which can increase the undef elements in the mask) ahead of SimplifyDemandedVectorElts(). llvm-svn: 357272
* [InstCombine] Use uadd.sat and usub.sat for canonicalizationNikita Popov2019-03-271-28/+21
| | | | | | | | | | | | | Start using the uadd.sat and usub.sat intrinsics for the existing canonicalizations. These intrinsics should optimize better than expanded IR, have better handling in the X86 backend and should be no worse than expanded IR in other backends, as far as we know. rL357012 already introduced use of uadd.sat for the add+umin pattern. Differential Revision: https://reviews.llvm.org/D58872 llvm-svn: 357103
* [InstCombine] form uaddsat from add+umin (PR14613)Sanjay Patel2019-03-261-0/+25
| | | | | | | | | | | | | | | | | | This is the last step towards solving the examples shown in: https://bugs.llvm.org/show_bug.cgi?id=14613 With this change, x86 should end up with psubus instructions when those are available. All known codegen issues with expanding the saturating intrinsics were resolved with: D59006 / rL356855 We also have some early evidence in D58872 that using the intrinsics will lead to better perf. If some target regresses from this, custom lowering of the intrinsics (as in the above for x86) may be needed. llvm-svn: 357012
* InstCombineSimplifyDemanded: Allow v3 results for AMDGCN buffer and image ↵Tim Renouf2019-03-221-2/+1
| | | | | | | | | | | | | intrinsics This helps to avoid the situation where RA spots that only 3 of the v4f32 result of a load are used, and immediately reallocates the 4th register for something else, requiring a stall waiting for the load. Differential Revision: https://reviews.llvm.org/D58906 Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa llvm-svn: 356768
* [InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) ↵Craig Topper2019-03-211-1/+5
| | | | | | | | | | | | | | if either zext or OP has another use. If they have other users we'll just end up increasing the instruction count. We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed. Fixes PR41164. Differential Revision: https://reviews.llvm.org/D59630 llvm-svn: 356690
* [instcombine] Add some todos, and arrange code for readibilityPhilip Reames2019-03-212-32/+38
| | | | llvm-svn: 356642
* Simplify operands of masked stores and scatters based on demanded elementsPhilip Reames2019-03-202-8/+52
| | | | | | | | If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems. Differential Revision: https://reviews.llvm.org/D57247 llvm-svn: 356590
OpenPOWER on IntegriCloud