summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [Attributor] AAValueConstantRange: Value range analysis using constant rangeHideto Ueno2020-01-017-32/+796
| | | | | | | | | | | | | | | | | | | | | This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point. One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument). The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty). Currently, AAValueConstantRange is created when AAValueSimplify cannot simplify the value. Supported - BinaryOperator(add, sub, ...) - CmpInst(icmp eq, ...) - !range metadata `AAValueConstantRange` is not intended to extend to polyhedral range value analysis. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D71620
* [X86][InstCombine] Add constant folding and simplification support for pdep ↵Craig Topper2019-12-311-0/+132
| | | | | | | | | | | | and pext The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result. There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted. Fixes PR44389 Differential Revision: https://reviews.llvm.org/D71952
* [InstCombine] fold zext of masked bit set/clearSanjay Patel2019-12-311-28/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1
* [InstCombine] add/adjust tests for masked bit; NFCSanjay Patel2019-12-311-6/+66
|
* Revert "[InstCombine] Fix infinite loop due to bitcast <-> phi transforms"Nikita Popov2019-12-311-142/+0
| | | | | | This reverts commit 27a0795943fee0f30b995fe5165428afc2dfd402. Seems to break test-suite.
* [InstCombine] add tests for masked bit set/clear; NFCSanjay Patel2019-12-311-20/+188
|
* [InstCombine] Fix infinite loop due to bitcast <-> phi transformsNikita Popov2019-12-311-0/+142
| | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by adding manually removing the old phis. Differential Revision: https://reviews.llvm.org/D71164
* [InstCombine] Don't rewrite phi-of-bitcast when the phi has other usersConnor Abbott2019-12-311-27/+25
| | | | | | | | | Judging by the existing comments, this was the intention, but the transform never actually checked if the existing phi's would be removed. See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where this causes much worse code generation on AMDGPU. Differential Revision: https://reviews.llvm.org/D71209
* [InstCombine] Add tests for PR44242Connor Abbott2019-12-311-0/+192
| | | | Differential Revision: https://reviews.llvm.org/D71260
* [Attributor] Function signature rewrite infrastructureJohannes Doerfert2019-12-318-30/+49
| | | | | | | | | | | | | | | As part of the Attributor manifest we want to change the signature of functions. This patch introduces a fairly generic interface to do so. As a first, very simple, use case, we remove unused arguments. A second use case, pointer privatization, will be committed with this patch as well. A lot of the code and ideas are taken from argument promotion and we run all argument promotion tests through this framework as well. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68765
* [Attributor] Propagate known align from arguments to call sites argumentsJohannes Doerfert2019-12-314-9/+8
| | | | | | | Since the information is known we can simply use it at the call site. This is especially useful for callbacks but also helps regular calls. The test changes are mechanical.
* [Attributor] Use abstract call sites to determine associated argumentsJohannes Doerfert2019-12-315-13/+15
| | | | | | | | | | | | | | | | | | | | | | This is the second step after D67871 to make use of abstract call sites. In this patch the argument we associate with a abstract call site argument can be the one in the callback callee instead of the one in the callback broker. Caveat: We cannot allow no-alias arguments for problematic callbacks: As described in [1], adding no-alias (or restrict) to arguments could break synchronization as the synchronization effect, e.g., a barrier, does not "alias" with the pointer anymore. This disables no-alias annotation for potentially problematic arguments until we implement the fix described in [1]. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68008 [1] Compiler Optimizations for OpenMP, J. Doerfert and H. Finkel, International Workshop on OpenMP 2018, http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
* [Attributor] Annotate the memory behavior of call site argumentsJohannes Doerfert2019-12-3127-77/+79
| | | | | | | | | Especially for callbacks, annotating the call site arguments is important. Doing so exposed a too strong dependence of AAMemoryBehavior on AANoCapture since we handle the case of potentially captured pointers explicitly. The changes to the tests are all mechanical.
* [InstCombine] remove stale comment on test; NFCSanjay Patel2019-12-301-1/+1
|
* [InstCombine] propagate sign argument through nested copysignsSanjay Patel2019-12-301-2/+1
| | | | | This is another optimization suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153
* [NFC] Add test for load-insert-store patternQiu Chaofan2019-12-301-0/+98
| | | | | | | This patch adds necessary test cases for load-update-store pattern which only updates single element of vector. Differential Revision: https://reviews.llvm.org/D71886
* [Attributor] Use `changeUseAfterManifest` in AAValueSimplify manifestHideto Ueno2019-12-305-6/+6
| | | | | | | | | | | | | | Summary: This patch makes `AAValueSimplify` use `changeUsesAfterManifest` in `manifest`. This will invoke simple folding after the manifest. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71972
* [Attributor] AAUndefinedBehavior: Check for branches on undef value.Hideto Ueno2019-12-294-12/+159
| | | | | | | | | | | | | | | A branch is considered UB if it depends on an undefined / uninitialized value. At this point this handles simple UB branches in the form: `br i1 undef, ...` We query `AAValueSimplify` to get a value for the branch condition, so the branch can be more complicated than just: `br i1 undef, ...`. Patch By: Stefanos Baziotis (@baziotis) Reviewers: jdoerfert, sstefan1, uenoku Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D71799
* [PowerPC][LoopVectorize]Add floating point reg usage testJinsong Ji2019-12-271-0/+91
| | | | Copied two tests from x86 to test floating point reg usage.
* [Matrix] Propagate and use shape info for binary operators.Florian Hahn2019-12-272-6/+97
| | | | | | | | | | | | | This patch extends the current shape propagation and shape aware lowering to also support binary operators. Those operators are uniform with respect to their shape (shape of the input operands is the same as the shape of their result). Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70898
* [Attributor] UB Attribute now handles all instructions that access memory ↵Johannes Doerfert2019-12-241-14/+122
| | | | | | | | | | | | | | | | | | | | | | | | | through a pointer Summary: Follow-up on: https://reviews.llvm.org/D71435 We basically use `checkForAllInstructions` to loop through all the instructions in a function that access memory through a pointer: load, store, atomicrmw, atomiccmpxchg Note that we can now use the `getPointerOperand()` that gets us the pointer operand for an instruction that belongs to the aforementioned set. Question: This function returns `nullptr` if the instruction is `volatile`. Why? Guess: Because if it is volatile, we don't want to do any transformation to it. Another subtle point is that I had to add AtomicRMW, AtomicCmpXchg to `initializeInformationCache()`. Following `checkAllInstructions()` path, that seemed the most reasonable place to add it and correct the fact that these instructions were ignored (they were not in `OpcodeInstMap` etc.). Is that ok? Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert, sstefan1 Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71787
* [Attributor] Function level undefined behavior attributeJohannes Doerfert2019-12-241-0/+38
| | | | | | | | | | | | | | | _Eventually_, this attribute will be assigned to a function if it contains undefined behavior. As a first small step, I tried to make it loop through the load instructions in a function (eventually, the plan is to check if a load instructions causes undefined behavior, because e.g. dereferences a null pointer - Also eventually, this won't happen in initialize() but in updateImpl()). Patch By: Stefanos Baziotis (@baziotis) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D71435
* Migrate function attribute "no-frame-pointer-elim"="false" to ↵Fangrui Song2019-12-2435-52/+52
| | | | "frame-pointer"="none" as cleanups after D56351
* Migrate function attribute "no-frame-pointer-elim-non-leaf" to ↵Fangrui Song2019-12-244-5/+5
| | | | "frame-pointer"="non-leaf" as cleanups after D56351
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-2473-106/+106
| | | | as cleanups after D56351
* [InstCombine] add test for copysign; NFCSanjay Patel2019-12-231-0/+14
|
* [InstCombine] add tests for not(select ...); NFCSanjay Patel2019-12-231-0/+142
|
* [Matrix] Use fmuladd for matrix.multiply if allowed.Florian Hahn2019-12-234-0/+276
| | | | | | | | | | | | | If the matrix.multiply calls have the contract fast math flag, we can use fmuladd. This als adds a command line option to force fmuladd generation. We can retire this option once there is a clang-level option. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70951
* [Matrix] Add forward shape propagation and first shape aware lowerings.Florian Hahn2019-12-233-264/+392
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds infrastructure for forward shape propagation to LowerMatrixIntrinsics. It also updates the pass to make use of the shape information to break up larger vector operations and to eliminate unnecessary conversion operations between columnwise matrixes and flattened vectors: if shape information is available for an instruction, lower the operation to a set of instructions operating on columns. For example, a store of a matrix is broken down into separate stores for each column. For users that do not have shape information (e.g. because they do not yet support shape information aware lowering), we pack the result columns into a flat vector and update those users. It also adds shape aware lowering for the first non-intrinsic instruction: vector stores. Example: For %c = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2) store <4 x double> %c, <4 x double>* %Ptr We generate the code below without shape propagation. Note %9 which combines the columns of the transposed matrix into a flat vector. %split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1> %split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3> %1 = extractelement <2 x double> %split, i64 0 %2 = insertelement <2 x double> undef, double %1, i64 0 %3 = extractelement <2 x double> %split1, i64 0 %4 = insertelement <2 x double> %2, double %3, i64 1 %5 = extractelement <2 x double> %split, i64 1 %6 = insertelement <2 x double> undef, double %5, i64 0 %7 = extractelement <2 x double> %split1, i64 1 %8 = insertelement <2 x double> %6, double %7, i64 1 %9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3> store <4 x double> %9, <4 x double>* %Ptr With this patch, we propagate the 2x2 shape information from the transpose to the store and we generate the code below. Note that we store the columns directly and do not need an extra shuffle. %9 = bitcast <4 x double>* %Ptr to double* %10 = bitcast double* %9 to <2 x double>* store <2 x double> %4, <2 x double>* %10, align 8 %11 = getelementptr double, double* %9, i32 2 %12 = bitcast double* %11 to <2 x double>* store <2 x double> %8, <2 x double>* %12, align 8 Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70897
* Revert "[ARM][TypePromotion] Enable by default"Reid Kleckner2019-12-2210-10/+10
| | | | | | | | This reverts commit ee7579409b7d940c4e1314d126e900db30c4edff. It causes crashes during ThinLTO. I suspect the issue is related to races on the global TypeSize variable, which is 80 at the time of the crash.
* [InstCombine] enhance fold for copysign with known sign argSanjay Patel2019-12-221-6/+4
| | | | | This is another optimization suggested in PRPR44153: https://bugs.llvm.org/show_bug.cgi?id=44153
* [InstCombine] check alloc size in bitcast of geps fold (PR44321)Sanjay Patel2019-12-211-8/+28
| | | | | | | | | | We missed a constraint in D44833 when folding a bitcast into a GEP with vector/array types. If the alloc sizes specified by the datalayout don't match, this could miscompile as shown in: https://bugs.llvm.org/show_bug.cgi?id=44321 Differential Revision: https://reviews.llvm.org/D71771
* [SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transformsSanjay Patel2019-12-211-14/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in PR44330: https://bugs.llvm.org/show_bug.cgi?id=44330 ...the transform from pow(X, -0.5) libcall/intrinsic to reciprocal square root can result in small deviations from the expected result due to differences in the pow() implementation and/or the extra rounding step from the division. This patch proposes to allow that difference with either the 'approximate functions' or 'reassociate' FMF: http://llvm.org/docs/LangRef.html#fast-math-flags In practice, this likely means that the code is compiled with all of 'fast' (-ffast-math), but I have preserved the existing specializations for -0.0/-INF that enable generating safe code if those special values are allowed simultaneously with allowing approximation/reassociation. The question about whether a similar restriction is needed for the non-reciprocal case -- pow(X, 0.5) -- is deferred. That transform is allowed without FMF currently, and this patch does not change that behavior. Differential Revision: https://reviews.llvm.org/D71706
* [InstCombine] Improve infinite loop detectionJakub Kuderski2019-12-201-0/+3
| | | | | | | | | | | | | | | | | | | Summary: This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop. Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details. The two limits can be specified via command line options. Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71673
* Temporarily revert "Reapply [LVI] Normalize pointer behavior" and "[LVI] ↵Jordan Rupprecht2019-12-201-2/+1
| | | | | | Restructure caching" This reverts commits 7e18aeba5062cd4324a9efb7bc25c9dbc4a34c2c (D70376) 21fbd5587cdfa11dabb3aeb0ead2d3d5fd0b490d (D69914) due to increased memory usage.
* [InstCombine] add tests for cast+gep; NFCSanjay Patel2019-12-201-0/+44
| | | | | PR44321: https://bugs.llvm.org/show_bug.cgi?id=44321
* [LV] Strip wrap flags from vectorized reductionsAyal Zaks2019-12-208-21/+79
| | | | | | | | | | | | A sequence of additions or multiplications that is known not to wrap, may wrap if it's order is changed (i.e., reassociated). Therefore when vectorizing integer sum or product reductions, their no-wrap flags need to be removed. Fixes PR43828 Patch by Denis Antrushin Differential Revision: https://reviews.llvm.org/D69563
* [ValueTracking] isKnownNonZero() should take non-null-ness assumptions into ↵Roman Lebedev2019-12-202-25/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | consideration (PR43267) Summary: It is pretty common to assume that something is not zero. Even optimizer itself sometimes emits such assumptions (e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`). But we currently don't deal with such assumptions :) The only way `isKnownNonZero()` handles assumptions is by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`. But `x != 0` does not tell us anything about set bits, it only says that there are *some* set bits. So naturally, `KnownBits` does not get populated, and we fail to make use of this assumption. I propose to deal with this special case by special-casing it via adding a `isKnownNonZeroFromAssume()` that returns boolean when there is an applicable assumption. While there, we also deal with other predicates, mainly if the comparison is with constant. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 | PR43267 ]]. Differential Revision: https://reviews.llvm.org/D71660
* [ValueTracking] isValidAssumeForContext(): CxtI itself also must transfer ↵Roman Lebedev2019-12-201-1/+1
| | | | | | | | | | | | | execution to successor This is a pretty rare case, when CxtI and assume are in the same basic block, with assume being located later. We were already checking that assumption was guaranteed to be executed, but we omitted CxtI itself from consideration, and as the test (miscompile) shows, that is incorrect. As noted in D71660 review by @nikic.
* [NFC][InstCombine] Add a test for assume-induced miscompileRoman Lebedev2019-12-201-0/+17
| | | | | | | | @escape() may throw here, we don't know that assumption, which is located afterwards in the same block, is executed, therefore %load arg of call to @escape() can not be marked as non-null. As noted in D71660 review by @nikic.
* HotColdSplitting: Do not outline within noreturn functionsVedant Kumar2019-12-191-0/+20
| | | | | | | A function marked `noreturn` may contain unreachable terminators: these should not be considered cold, as the function may be a trampoline. rdar://58068594
* [SLP]Fix test arguments, NFC.Alexey Bataev2019-12-191-9/+5
|
* [NFC][InstCombine] Add some more non-zero assumption variants (D71660)Roman Lebedev2019-12-191-0/+229
| | | | https://rise4fun.com/Alive/6yR
* [SLP]Added test for gathering reused extracts from narrow vector, NFC.Alexey Bataev2019-12-191-0/+71
|
* [InstCombine] add/adjust tests for pow->sqrt; NFCSanjay Patel2019-12-191-35/+77
| | | | There's at least 1 bug here as discussed in PR44330.
* [ConstantHoisting] Ignore unreachable bb:s when collecting candidatesBjorn Pettersson2019-12-192-0/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Ignore looking at blocks that are unreachable from entry when collecting candidates for hosting. Normally the consthoist pass is executed in the llc pipeline, just after unreachableblockelim. So it is abnormal to have code that is unreachable from the entry block. But when running the pass as part of opt, for example as part of fuzzy testing, we might trigger various kinds of asserts when collecting candidates if we include unreachable blocks in that analysis. It seems like a waste of time to hoist constants in unreachble blocks, so the solution is to simply ignore such blocks when collecting the hoisting candidates. The two added test cases used to end up in two different asserts, and the intention with the checks is just to verify that we no longer fail. Fixes: PR43903 Reviewers: spatel Reviewed By: spatel Subscribers: hiraditya, uabelho, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71678
* [InstCombine] Canonicalize select immediatesDavid Green2019-12-191-11/+11
| | | | | | | | | | | | | | | | | | | | In certain situations after inlining and simplification we end up with code that is _almost_ a min/max pattern, but contains constants that have been demand-bit optimised to the wrong values, ending up with code like: %1 = icmp slt i32 %shr, -128 %2 = select i1 %1, i32 128, i32 %shr %.inv = icmp sgt i32 %shr, 127 %spec.select.i = select i1 %.inv, i32 127, i32 %2 %conv7 = trunc i32 %spec.select.i to i8 This should be turned into a min/max pattern, but the -128 in the first select was instead transformed into 128, as only the bottom byte was ever demanded. To fix this, I've put in further canonicalisation for the immediates of selects, preferring to use the same value as the icmp if available. Differential Revision: https://reviews.llvm.org/D71516
* [Instcombine] Add select canonicalization tests. NFCDavid Green2019-12-191-0/+70
|
* Revert "[InstCombine][AMDGPU] Trim more components of *buffer_load"Piotr Sobczak2019-12-181-150/+150
| | | | | | Revert D70315, as it breaks gfx8 for some reason. This reverts commit 65f94b33808d7d69539961a6f5a2168f0a1eef41.
* [InstCombine] Insert instructions before adding them to worklistJakub Kuderski2019-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions. It also adds a check in `Worklist::Add` that ensures that all added instructions have parents. Simple test case that illustrates the difference when run with `--debug-only=instcombine`: ``` define i32 @test35(i32 %a, i32 %b) { %1 = or i32 %a, 1135 %2 = or i32 %1, %b ret i32 %2 } ``` Before this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: <badref> = or i32 %2, 1135 ... ``` With this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: %3 = or i32 %2, 1135 ... ``` Reviewers: fhahn, davide, spatel, foad, grosser, nikic Reviewed By: nikic Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71093
OpenPOWER on IntegriCloud