summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
...
* [InstCombine] add tests for masked bit set/clear; NFCSanjay Patel2019-12-311-20/+188
|
* [InstCombine] Fix infinite loop due to bitcast <-> phi transformsNikita Popov2019-12-311-0/+142
| | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by adding manually removing the old phis. Differential Revision: https://reviews.llvm.org/D71164
* [InstCombine] Don't rewrite phi-of-bitcast when the phi has other usersConnor Abbott2019-12-311-27/+25
| | | | | | | | | Judging by the existing comments, this was the intention, but the transform never actually checked if the existing phi's would be removed. See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where this causes much worse code generation on AMDGPU. Differential Revision: https://reviews.llvm.org/D71209
* [InstCombine] Add tests for PR44242Connor Abbott2019-12-311-0/+192
| | | | Differential Revision: https://reviews.llvm.org/D71260
* [InstCombine] remove stale comment on test; NFCSanjay Patel2019-12-301-1/+1
|
* [InstCombine] propagate sign argument through nested copysignsSanjay Patel2019-12-301-2/+1
| | | | | This is another optimization suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153
* [NFC] Add test for load-insert-store patternQiu Chaofan2019-12-301-0/+98
| | | | | | | This patch adds necessary test cases for load-update-store pattern which only updates single element of vector. Differential Revision: https://reviews.llvm.org/D71886
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-244-4/+4
| | | | as cleanups after D56351
* [InstCombine] add test for copysign; NFCSanjay Patel2019-12-231-0/+14
|
* [InstCombine] add tests for not(select ...); NFCSanjay Patel2019-12-231-0/+142
|
* [InstCombine] enhance fold for copysign with known sign argSanjay Patel2019-12-221-6/+4
| | | | | This is another optimization suggested in PRPR44153: https://bugs.llvm.org/show_bug.cgi?id=44153
* [InstCombine] check alloc size in bitcast of geps fold (PR44321)Sanjay Patel2019-12-211-8/+28
| | | | | | | | | | We missed a constraint in D44833 when folding a bitcast into a GEP with vector/array types. If the alloc sizes specified by the datalayout don't match, this could miscompile as shown in: https://bugs.llvm.org/show_bug.cgi?id=44321 Differential Revision: https://reviews.llvm.org/D71771
* [SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transformsSanjay Patel2019-12-211-14/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in PR44330: https://bugs.llvm.org/show_bug.cgi?id=44330 ...the transform from pow(X, -0.5) libcall/intrinsic to reciprocal square root can result in small deviations from the expected result due to differences in the pow() implementation and/or the extra rounding step from the division. This patch proposes to allow that difference with either the 'approximate functions' or 'reassociate' FMF: http://llvm.org/docs/LangRef.html#fast-math-flags In practice, this likely means that the code is compiled with all of 'fast' (-ffast-math), but I have preserved the existing specializations for -0.0/-INF that enable generating safe code if those special values are allowed simultaneously with allowing approximation/reassociation. The question about whether a similar restriction is needed for the non-reciprocal case -- pow(X, 0.5) -- is deferred. That transform is allowed without FMF currently, and this patch does not change that behavior. Differential Revision: https://reviews.llvm.org/D71706
* [InstCombine] Improve infinite loop detectionJakub Kuderski2019-12-201-0/+3
| | | | | | | | | | | | | | | | | | | Summary: This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop. Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details. The two limits can be specified via command line options. Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71673
* [InstCombine] add tests for cast+gep; NFCSanjay Patel2019-12-201-0/+44
| | | | | PR44321: https://bugs.llvm.org/show_bug.cgi?id=44321
* [ValueTracking] isKnownNonZero() should take non-null-ness assumptions into ↵Roman Lebedev2019-12-201-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | consideration (PR43267) Summary: It is pretty common to assume that something is not zero. Even optimizer itself sometimes emits such assumptions (e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`). But we currently don't deal with such assumptions :) The only way `isKnownNonZero()` handles assumptions is by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`. But `x != 0` does not tell us anything about set bits, it only says that there are *some* set bits. So naturally, `KnownBits` does not get populated, and we fail to make use of this assumption. I propose to deal with this special case by special-casing it via adding a `isKnownNonZeroFromAssume()` that returns boolean when there is an applicable assumption. While there, we also deal with other predicates, mainly if the comparison is with constant. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 | PR43267 ]]. Differential Revision: https://reviews.llvm.org/D71660
* [ValueTracking] isValidAssumeForContext(): CxtI itself also must transfer ↵Roman Lebedev2019-12-201-1/+1
| | | | | | | | | | | | | execution to successor This is a pretty rare case, when CxtI and assume are in the same basic block, with assume being located later. We were already checking that assumption was guaranteed to be executed, but we omitted CxtI itself from consideration, and as the test (miscompile) shows, that is incorrect. As noted in D71660 review by @nikic.
* [NFC][InstCombine] Add a test for assume-induced miscompileRoman Lebedev2019-12-201-0/+17
| | | | | | | | @escape() may throw here, we don't know that assumption, which is located afterwards in the same block, is executed, therefore %load arg of call to @escape() can not be marked as non-null. As noted in D71660 review by @nikic.
* [InstCombine] add/adjust tests for pow->sqrt; NFCSanjay Patel2019-12-191-35/+77
| | | | There's at least 1 bug here as discussed in PR44330.
* [InstCombine] Canonicalize select immediatesDavid Green2019-12-191-11/+11
| | | | | | | | | | | | | | | | | | | | In certain situations after inlining and simplification we end up with code that is _almost_ a min/max pattern, but contains constants that have been demand-bit optimised to the wrong values, ending up with code like: %1 = icmp slt i32 %shr, -128 %2 = select i1 %1, i32 128, i32 %shr %.inv = icmp sgt i32 %shr, 127 %spec.select.i = select i1 %.inv, i32 127, i32 %2 %conv7 = trunc i32 %spec.select.i to i8 This should be turned into a min/max pattern, but the -128 in the first select was instead transformed into 128, as only the bottom byte was ever demanded. To fix this, I've put in further canonicalisation for the immediates of selects, preferring to use the same value as the icmp if available. Differential Revision: https://reviews.llvm.org/D71516
* [Instcombine] Add select canonicalization tests. NFCDavid Green2019-12-191-0/+70
|
* Revert "[InstCombine][AMDGPU] Trim more components of *buffer_load"Piotr Sobczak2019-12-181-150/+150
| | | | | | Revert D70315, as it breaks gfx8 for some reason. This reverts commit 65f94b33808d7d69539961a6f5a2168f0a1eef41.
* [InstCombine] Insert instructions before adding them to worklistJakub Kuderski2019-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions. It also adds a check in `Worklist::Add` that ensures that all added instructions have parents. Simple test case that illustrates the difference when run with `--debug-only=instcombine`: ``` define i32 @test35(i32 %a, i32 %b) { %1 = or i32 %a, 1135 %2 = or i32 %1, %b ret i32 %2 } ``` Before this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: <badref> = or i32 %2, 1135 ... ``` With this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: %3 = or i32 %2, 1135 ... ``` Reviewers: fhahn, davide, spatel, foad, grosser, nikic Reviewed By: nikic Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71093
* [InstCombine] Allow to limit the max number of iterationsJakub Kuderski2019-12-181-0/+41
| | | | | | | | | | | | | | | | | | | | | | | Summary: This patch teaches InstCombine to accept a new parameter: maximum number of iterations over functions. InstCombine tries to simplify instructions by iterating over the whole function until the function stops changing. As a consequence, the last iteration before reaching a fixpoint visits all instructions in the worklist and never performs any rewrites. Bounding the number of iterations can have 2 benefits: * In case the users of the pass can make a good guess about the number of required iterations, we can save the time normally spent on the last iteration that doesn't change anything. * When the wants to use InstCombine as a cleanup pass, it may be enough to run just a few iterations and stop even before reaching a fixpoint. This can be also useful for implementing a lightweight pass pipeline (think `-O1`). This patch does not change the behavior of opt or Clang -- limiting the number of iterations is entirely opt-in. Reviewers: fhahn, davide, spatel, foad, nlopes, grosser, lebedev.ri, nikic, xbolva00 Reviewed By: spatel Subscribers: craig.topper, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71145
* Reapply: [DebugInfo] Correctly handle salvaged casts and split fragments at ISelstozer2019-12-183-5/+6
| | | | | | | | This reverts commit 1f3dd83cc1f2b8f72b9d59e2b4221b12fb7f9a95, reapplying commit bb1b0bc4e57428ce364d3d6c075ff03cb8973462. The original commit failed on some builds seemingly due to the use of a bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.
* [NFC][InstCombine] Autogenerate assume.ll testRoman Lebedev2019-12-181-40/+77
|
* [InstCombine] add tests for copysign; NFCSanjay Patel2019-12-181-0/+23
|
* Revert "[DebugInfo] Correctly handle salvaged casts and split fragments at ISel"stozer2019-12-183-6/+5
| | | | | | Reverted due to build failure on windows bots. This reverts commit bb1b0bc4e57428ce364d3d6c075ff03cb8973462.
* [DebugInfo] Correctly handle salvaged casts and split fragments at ISelstozer2019-12-183-5/+6
| | | | | | | | | | | | | | | Previously, LLVM had no functional way of performing casts inside of a DIExpression(), which made salvaging cast instructions other than Noop casts impossible. This patch enables the salvaging of casts by using the DW_OP_LLVM_convert operator for SExt and Trunc instructions. There is another issue which is exposed by this fix, in which fragment DIExpressions (which are preserved more readily by this patch) for values that must be split across registers in ISel trigger an assertion, as the 'split' fragments extend beyond the bounds of the fragment DIExpression causing an error. This patch also fixes this issue by checking the fragment status of DIExpressions which are to be split, and dropping fragments that are invalid.
* [InstCombine][AMDGPU] Trim more components of *buffer_loadPiotr Sobczak2019-12-171-150/+150
| | | | | | | | | | | | | | Summary: Add trimming of unused components of s_buffer_load. Extend trimming of *buffer_load to also include unused components at the beginning of vectors and update offset. Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70315
* [InstCombine] Teach removeBitcastsFromLoadStoreOnMinMax not to change the ↵Craig Topper2019-12-161-2/+5
| | | | | | | | | | size of a store. We can change the type as long as we don't change the size. Fixes PR44306 Differential Revision: https://reviews.llvm.org/D71532
* [InstSimplify] fold splat of inserted constant to vector constantSanjay Patel2019-12-151-2/+1
| | | | | | | | | | | | | | | | | shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...> This is another missing shuffle fold pattern uncovered by the shuffle correctness fix from D70246. The problem was visible in the post-commit thread example, but we managed to overcome the limitation for that particular case with D71220. This is something like the inverse of the previous fix - there we didn't demand the inserted scalar, and here we are only demanding an inserted scalar. Differential Revision: https://reviews.llvm.org/D71488
* Reland [DataLayout] Fix occurrences that size and range of pointers are ↵Nicola Zaghen2019-12-134-2/+75
| | | | | | | | | | | | | | assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. This fixes the buildbot failures. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!
* Temporarily Revert "[DataLayout] Fix occurrences that size and range of ↵Nicola Zaghen2019-12-124-75/+2
| | | | | | | | | pointers are assumed to be the same." This reverts commit 5f6208778ff92567c57d7c1e2e740c284d7e69a5. This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.
* [DataLayout] Fix occurrences that size and range of pointers are assumed to ↵Nicola Zaghen2019-12-124-2/+75
| | | | | | | | | | | | be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!
* [InstCombine] Optimize overflow check base on uadd.with.overflow resultNikita Popov2019-12-111-14/+7
| | | | | | | | | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=40846. This adds a combine for cases where a (a + b) < a style overflow check is performed, but with a + b being the result of uadd.with.overflow, so the overflow result is also already available and we can just use it. Subsequently GVN/CSE will deduplicate the extracts. We can run into this situation if you have both a uadd.with.overflow and a manual add + overflow check in the same function (on the same operands), in which case GVN will rewrite the add to the with.overflow result and leave you with this pattern. The implementation is a bit ugly because I'm handling the various canonicalization edge cases. This does not yet handle the negated version of this pattern. Differential Revision: https://reviews.llvm.org/D58644
* [ValueTracking] Pointer is known nonnull after load/storeDanila Kutenin2019-12-113-36/+36
| | | | | | | | | | | If the pointer was loaded/stored before the null check, the check is redundant and can be removed. For now the optimizers do not remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG. The patch allows to use more nonnull constraints. Also, it found one more optimization in some PowerPC test. This is my first llvm review, I am free to any comments. Differential Revision: https://reviews.llvm.org/D71177
* [InstSimplify] add tests for insert constant + splat; NFCSanjay Patel2019-12-101-0/+13
|
* [InstCombine] replace shuffle's insertelement operand if inserted scalar is ↵Sanjay Patel2019-12-101-2/+6
| | | | | | | | | | | | | | not demanded This pattern is noted as a regression from: D70246 ...where we removed an over-aggressive shuffle simplification. SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses, so I'm proposing to pattern match the minimal sequence directly. This fold does not conflict with any of our current shuffle undef/poison semantics. Differential Revision: https://reviews.llvm.org/D71220
* [ValueTracking] Allow context-sensitive nullness check for non-pointersJohannes Doerfert2019-12-091-2/+2
| | | | | | | | | | | | | | | | | | Summary: Same as D60846 and D69571 but with a fix for the problem encountered after them. Both times it was a missing context adjustment in the handling of PHI nodes. The reproducers created from the bugs that caused the old commits to be reverted are included. Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans Subscribers: hiraditya, bollu, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71181
* [InstCombine] add tests for shuffle with insertelement operand; NFCSanjay Patel2019-12-091-0/+52
|
* Revert "[InstCombine] keep assumption before sinking calls"Bob Haarman2019-12-052-192/+45
| | | | | | | | | | | | | | | | | | | | | Summary: This reverts commit c3b06d0c393e533eab712922911d14e5a079fa5d. Reason for revert: Caused miscompiles when inserting assume for undef. Also adds a test to prevent similar breakage in future. Fixes PR44154. Reviewers: rnk, jdoerfert, efriedma, xbolva00 Reviewed By: rnk Subscribers: thakis, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70933
* [InstCombine] Invert `add A, sext(B) --> sub A, zext(B)` canonicalization ↵Roman Lebedev2019-12-055-19/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (to `sub A, zext B -> add A, sext B`) Summary: D68408 proposes to greatly improve our negation sinking abilities. But in current canonicalization, we produce `sub A, zext(B)`, which we will consider non-canonical and try to sink that negation, undoing the existing canonicalization. So unless we explicitly stop producing previous canonicalization, we will have two conflicting folds, and will end up endlessly looping. This inverts canonicalization, and adds back the obvious fold that we'd miss: * `sub [nsw] Op0, sext/zext (bool Y) -> add [nsw] Op0, zext/sext (bool Y)` https://rise4fun.com/Alive/xx4 * `sext(bool) + C -> bool ? C - 1 : C` https://rise4fun.com/Alive/fBl It is obvious that `@ossfuzz_9880()` / `@lshr_out_of_range()`/`@ashr_out_of_range()` (oss-fuzz 4871) are no longer folded as much, though those aren't really worrying. Reviewers: spatel, efriedma, t.p.northover, hfinkel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71064
* [InstCombine] narrow select with FP castsSanjay Patel2019-12-051-11/+15
| | | | Select doesn't change values, so truncate of extended operand cancels out.
* [InstCombine] add tests for fpext+select+fptrunc; NFCSanjay Patel2019-12-051-0/+84
|
* [InstCombine] Extend `0 - (X sdiv C) -> (X sdiv -C)` fold to non-splat vectorsRoman Lebedev2019-12-051-6/+3
| | | | Split off from https://reviews.llvm.org/D68408
* [NFC][InstCombine] Autogenerate check lines in a few testsRoman Lebedev2019-12-054-130/+130
| | | | These files are potentially affected by Negator (D68408) patch.
* [NFC][InstCombine] Update sub-of-negatible.ll testRoman Lebedev2019-12-041-37/+122
|
* [InstCombine] Revert aafde063aaf09285c701c80cd4b543c2beb523e8 and ↵Craig Topper2019-12-031-3/+5
| | | | | | | | | | | | | | | | 6749dc3446671df05235d0a218c426a314ac33cd related to bitcast handling of x86_mmx This reverts these two commits [InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double. [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement We're seeing at least one internal test failure related to a bitcast that was previously before an inline assembly block containing emms being placed after it. This leads to the mmx state ending up not empty after the emms. IR has no way to make any specific guarantees about this. Reverting these patches to get back to previous behavior which at least worked for this test.
* [InstCombine] fix undef propagation for vector urem transform (PR44186)Sanjay Patel2019-12-021-1/+1
| | | | | | | | | As described here: https://bugs.llvm.org/show_bug.cgi?id=44186 The match() code safely allows undef values, but we can't safely propagate a vector constant that contains an undef to the new compare instruction.
OpenPOWER on IntegriCloud