summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [Local] replaceAllDbgUsesWith: Update debug values before RAUWVedant Kumar2018-07-064-2/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The replaceAllDbgUsesWith utility helps passes preserve debug info when replacing one value with another. This improves upon the existing insertReplacementDbgValues API by: - Updating debug intrinsics in-place, while preventing use-before-def of the replacement value. - Falling back to salvageDebugInfo when a replacement can't be made. - Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into common utility code. Along with the API change, this teaches replaceAllDbgUsesWith how to create DIExpressions for three basic integer and pointer conversions: - The no-op conversion. Applies when the values have the same width, or have bit-for-bit compatible pointer representations. - Truncation. Applies when the new value is wider than the old one. - Zero/sign extension. Applies when the new value is narrower than the old one. Testing: - check-llvm, check-clang, a stage2 `-g -O3` build of clang, regression/unit testing. - This resolves a number of mis-sized dbg.value diagnostics from Debugify. Differential Revision: https://reviews.llvm.org/D48676 llvm-svn: 336451
* [InstCombine] add more tests with poison and undef; NFCSanjay Patel2018-07-061-5/+540
| | | | | | | | As discussed in D48987 and D48893, there are many different ways to go wrong depending on the binop (and as shown here we already do go wrong in some cases). llvm-svn: 336450
* CallGraphSCCPass: iterate over all functions.Tim Northover2018-07-061-0/+7
| | | | | | | | | | | | | | | Previously we only iterated over functions reachable from the set of external functions in the module. But since some of the passes under this (notably the always-inliner and coroutine lowerer) are required for correctness, they need to run over everything. This just adds an extra layer of iteration over the CallGraph to keep track of which functions we've already visited and get the next batch of SCCs. Should fix PR38029. llvm-svn: 336419
* Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms ↵Max Kazantsev2018-07-065-24/+34
| | | | | | are done" llvm-svn: 336410
* Fix asserts in AMDGCN fmed3 folding by handling more cases of NaNMatt Arsenault2018-07-051-3/+45
| | | | | | | | | | | | | | | Better NaN handling for AMDGCN fmed3. All operands are checked for NaN now. The checks were moved before the canonicalization to provide a better mapping from fclamp. Changed the behaviour of fmed3(x,y,NaN) to return max(x,y) instead of min(x,y) in light of this. Updated tests as a result and added some new cases to cover the fix. Patch by Alan Baker llvm-svn: 336375
* [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart ↵Craig Topper2018-07-051-30/+64
| | | | | | independent FMA and extractelement/insertelement. llvm-svn: 336315
* [InstCombine] allow narrowing of min/max/absSanjay Patel2018-07-041-6/+3
| | | | | | | | | | | | | | | We have bailout hacks based on min/max in various places in instcombine that shouldn't be necessary. The affected test was added for: D48930 ...which is a consequence of the improvement in: D48584 (https://reviews.llvm.org/rL336172) I'm assuming the visitTrunc bailout in this patch was added specifically to avoid a change from SimplifyDemandedBits, so I'm just moving that below the EvaluateInDifferentType optimization. A narrow min/max is still a min/max. llvm-svn: 336293
* [InstCombine] add value names to test; NFCSanjay Patel2018-07-041-17/+17
| | | | | | That makes it easier to mix and match lines into other tests. llvm-svn: 336289
* NFC - Various typo fixes in testsGabor Buella2018-07-046-7/+7
| | | | llvm-svn: 336268
* [NFC] Add test that shows that InstCombine can do betterMax Kazantsev2018-07-041-0/+24
| | | | llvm-svn: 336258
* [DebugInfo][LoopVectorize] Preserve DL in generated phi instructionAnastasis Grammenos2018-07-041-0/+9
| | | | | | | | | When creating `phi` instructions to resume at the scalar part of the loop, copy the DebugLoc from the original phi over to the new one. Differential Revision: https://reviews.llvm.org/D48769 llvm-svn: 336256
* [DebugInfo][InstCombine] Preserve DI after combining zextAnastasis Grammenos2018-07-041-2/+11
| | | | | | | | | | | When zext is EvaluatedInDifferentType, InstCombine drops the dbg.value intrinsic. This patch tries to preserve said DI, by inserting the zext's old DI in the resulting instruction. (Only for integer type for now) Differential Revision: https://reviews.llvm.org/D48331 llvm-svn: 336254
* [InstCombine] add tests for shuffle+binop with constant op1; NFCSanjay Patel2018-07-031-4/+25
| | | | | | This adds coverage for a planned enhancement for ConstantExpr::getBinOpIdentity() noted in D48830. llvm-svn: 336220
* [Constants] add identity constants for fadd/fmulSanjay Patel2018-07-033-14/+9
| | | | | | | | | | | | | | | | As the test diffs show, the current users of getBinOpIdentity() are InstCombine and Reassociate. SLP vectorizer is a candidate for using this functionality too (D28907). The InstCombine shuffle improvements are part of the planned enhancements noted in D48830. InstCombine actually has several other uses of getBinOpIdentity() via SimplifyUsingDistributiveLaws(), but we don't call that for any FP ops. Fixing that might be another part of removing the custom reassociation in InstCombine that is only done for fadd+fmul. llvm-svn: 336215
* [Reassociate] add tests for binop with identity constant; NFCSanjay Patel2018-07-031-0/+74
| | | | llvm-svn: 336214
* [Reassociate] regenerate checks; NFCSanjay Patel2018-07-031-77/+115
| | | | llvm-svn: 336211
* [Reassociate] add test for missing FP constant analysis; NFCSanjay Patel2018-07-031-3/+19
| | | | llvm-svn: 336208
* [InstCombine] fold shuffle-with-binop and common valueSanjay Patel2018-07-031-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | This is the last significant change suggested in PR37806: https://bugs.llvm.org/show_bug.cgi?id=37806#c5 ...though there are several follow-ups noted in the code comments in this patch to complete this transform. It's possible that a binop feeding a select-shuffle has been eliminated by earlier transforms (or the code was just written like this in the 1st place), so we'll fail to match the patterns that have 2 binops from: D48401, D48678, D48662, D48485. In that case, we can try to materialize identity constants for the remaining binop to fill in the "ghost" lanes of the vector (where we just want to pass through the original values of the source operand). I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups. For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor). Differential Revision: https://reviews.llvm.org/D48830 llvm-svn: 336196
* [DebugInfo] Corrections for salvageDebugInfoBjorn Pettersson2018-07-032-2/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When salvaging a dbg.declare/dbg.addr we should not add DW_OP_stack_value to the DIExpression (see test/Transforms/InstCombine/salvage-dbg-declare.ll). Consider this example %vla = alloca i32, i64 2 call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression()) Instcombine will turn it into %vla1 = alloca [2 x i32] %vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0 call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression()) If the GEP can be eliminated, then the dbg.declare will be salvaged and we should get %vla1 = alloca [2 x i32] call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression()) The problem was that salvageDebugInfo did not recognize dbg.declare as being indirect (%vla1 points to the value, it does not hold the value), so we incorrectly got call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value)) I also made sure that llvm::salvageDebugInfo and DIExpression::prependOpcodes do not add DW_OP_stack_value to the DIExpression in case no new operands are added to the DIExpression. That way we avoid to, unneccessarily, turn a register location expression into an implicit location expression in some situations (see test11 in test/Transforms/LICM/sinking.ll). Reviewers: aprantl, vsk Reviewed By: aprantl, vsk Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D48837 llvm-svn: 336191
* [PM/LoopUnswitch] Fix PR37651 by correctly invalidating SCEV whenChandler Carruth2018-07-031-0/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unswitching loops. Original patch trying to address this was sent in D47624, but that didn't quite handle things correctly. There are two key principles used to select whether and how to invalidate SCEV-cached information about loops: 1) We must invalidate any info SCEV has cached before unswitching as we may change (or destroy) the loop structure by the act of unswitching, and make it hard to recover everything we want to invalidate within SCEV. 2) We need to invalidate all of the loops whose CFGs are mutated by the unswitching. Notably, this isn't the *entire* loop nest, this is every loop contained by the outermost loop reached by an exit block relevant to the unswitch. And we need to do this even when doing trivial unswitching. I've added more focused tests that directly check that SCEV starts off with imprecise information and after unswitching (and simplifying instructions) re-querying SCEV will produce precise information. These tests also specifically work to check that an *outer* loop's information becomes precise. However, the testing here is still a bit imperfect. Crafting test cases that reliably fail to be analyzed by SCEV before unswitching and succeed afterward proved ... very, very hard. It took me several hours and careful work to build these, and I'm not optimistic about necessarily coming up with more to cover more elaborate possibilities. Fortunately, the code pattern we are testing here in the pass is really straightforward and reliable. Thanks to Max Kazantsev for the initial work on this as well as the review, and to Hal Finkel for helping me talk through approaches to test this stuff even if it didn't come to much. Differential Revision: https://reviews.llvm.org/D47624 llvm-svn: 336183
* [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are doneMax Kazantsev2018-07-034-33/+21
| | | | | | | | | | | | This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri llvm-svn: 336172
* [SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428).Tim Shen2018-07-021-10/+8
| | | | | | | | | | | | | | | | Summary: Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change SCEV is able to prove that the loop doesn't wrap-self (due to zext i16 to i64), disabling the entire loop versioning pass. Removed the zext and just use i64. Reviewers: sanjoy Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D48409 llvm-svn: 336140
* [SLP] Recognize min/max pattern using instructions producing same values.Farhana Aleen2018-07-021-65/+55
| | | | | | | | | | | | | | | | | | | Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization. %1 = extractelement <2 x i32> %a, i32 0 %2 = extractelement <2 x i32> %a, i32 1 %cond = icmp sgt i32 %1, %2 %3 = extractelement <2 x i32> %a, i32 0 %4 = extractelement <2 x i32> %a, i32 1 %select = select i1 %cond, i32 %3, i32 %4 Author: FarhanaAleen Reviewed By: ABataev, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D47608 llvm-svn: 336130
* [InstCombine] reverse canonicalization of add --> or to allow more shuffle ↵Sanjay Patel2018-07-021-12/+8
| | | | | | | | | | | | | | | | folding This extends D48485 to allow another pair of binops (add/or) to be combined either with or without a leading shuffle: or X, C --> add X, C (when X and C have no common bits set) Here, we need value tracking to determine that the 'or' can be reversed into an 'add', and we've added general infrastructure to allow extending to other opcodes or moving to where other passes could use that functionality. Differential Revision: https://reviews.llvm.org/D48662 llvm-svn: 336128
* [SLPVectorizer][X86] Begin adding alternate tests for call operatorsSimon Pilgrim2018-07-021-0/+65
| | | | | | Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls llvm-svn: 336125
* [ValueTracking] allow undef elements when matching vector absSanjay Patel2018-07-021-4/+4
| | | | llvm-svn: 336111
* [InstCombine] adjust shuffle tests with IR flags; NFCSanjay Patel2018-07-021-4/+3
| | | | | | | | | Due to current limitations in constant analysis, we need flags on add or mul to show propagation for the potential transform suggested in these tests (no other binops currently report identity constants). llvm-svn: 336101
* Recommit r328307: [IPSCCP] Use constant range information for comparisons of ↵Florian Hahn2018-07-021-13/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | parameters. This version contains a fix to add values for which the state in ParamState change to the worklist if the state in ValueState did not change. To avoid adding the same value multiple times, mergeInValue returns true, if it added the value to the worklist. The value is added to the worklist depending on its state in ValueState. Original message: For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 336098
* [InstCombine] add tests for shuffle-binop; NFCSanjay Patel2018-07-021-37/+256
| | | | | | This is another pattern mentioned in PR37806. llvm-svn: 336096
* [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct ↵Simon Pilgrim2018-07-021-3/+24
| | | | | | | | | | handle SK_Select patterns. We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case. This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now... llvm-svn: 336095
* [NFC] Test that shows unprofitability of instcombine with bit rangesMax Kazantsev2018-07-021-0/+32
| | | | llvm-svn: 336078
* Implement strip.invariant.groupPiotr Padlewski2018-07-027-18/+145
| | | | | | | | | | | | | | | | Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073
* [InstCombine] add abs tests with undef elts; NFCSanjay Patel2018-07-011-0/+30
| | | | llvm-svn: 336065
* [PatternMatch] allow undef elements in vectors with m_NegSanjay Patel2018-07-011-5/+3
| | | | | | This is similar to the m_Not change from D44076. llvm-svn: 336064
* [UnrollAndJam] New Unroll and Jam passDavid Green2018-07-015-0/+2482
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062
* [SLPVectorizer][X86] Add some alternate tests for cast operatorsSimon Pilgrim2018-07-011-0/+169
| | | | | | Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations llvm-svn: 336060
* [Evaluator] Improve evaluation of call instructionEugene Leviant2018-07-013-0/+239
| | | | | | Recommit of r335324 after buildbot failure fix llvm-svn: 336059
* [InstCombine] add tests for negate vector with undef elts; NFC Sanjay Patel2018-06-301-3/+27
| | | | llvm-svn: 336050
* [InstCombine] add more tests for shuffle-binop folds; NFCSanjay Patel2018-06-291-1/+73
| | | | | | | The mul+shl tests add coverage for the fold enabled with D48678. The and+or tests are not handled yet; that's D48662. llvm-svn: 335984
* [InstCombine] enhance shuffle-of-binops to allow different variable ops ↵Sanjay Patel2018-06-291-35/+28
| | | | | | | | | | | | | | | | | | | | | | | (PR37806) This was discussed in D48401 as another improvement for: https://bugs.llvm.org/show_bug.cgi?id=37806 If we have 2 different variable values, then we shuffle (select) those lanes, shuffle (select) the constants, and then perform the binop. This eliminates a binop. The new shuffle uses the same shuffle mask as the existing shuffle, so there's no danger of creating a difficult shuffle. All of the earlier constraints still apply, but we also check for extra uses to avoid creating more instructions than we'll remove. Additionally, we're disallowing the fold for div/rem because that could expose a UB hole. Differential Revision: https://reviews.llvm.org/D48678 llvm-svn: 335974
* SCEVExpander::expandAddRecExprLiterally(): check before casting as InstructionRoman Lebedev2018-06-291-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: An alternative to D48597. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]]. The problem is as follows: 1. `indvars` marks `%dec` as `NUW`. 2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908) 3. `loop-reduce` tries to do some further modification, but crashes with an type assertion in cast, because `%dec` is no longer an `Instruction`, If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`, store that into a file, and then run `-loop-reduce`, there is no crash. So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV. But in this case we can just not crash if it's not an `Instruction`. This is just a local fix, unlike D48597, so there may very well be other problems. Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi Reviewed By: mkazantsev Subscribers: evstupac, javed.absar, spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D48599 llvm-svn: 335950
* [InstCombine] adjust shuffle tests; NFCSanjay Patel2018-06-281-5/+5
| | | | | | Use xor for the extra uses test because div/rem have other problems. llvm-svn: 335924
* [ThinLTO] Port InlinerFunctionImportStats handling to new PMTeresa Johnson2018-06-281-0/+5
| | | | | | | | | | | | | | Summary: The InlinerFunctionImportStats will collect and dump stats regarding how many function inlined into the module were imported by ThinLTO. Reviewers: wmi, dexonsmith Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D48729 llvm-svn: 335914
* [SROA] Preserve DebugLoc when rewriting alloca partitionsAnastasis Grammenos2018-06-281-0/+9
| | | | | | | | | When rewriting an alloca partition copy the DL from the old alloca over the the new one. Differential Revision: https://reviews.llvm.org/D48640 llvm-svn: 335904
* [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)Sanjay Patel2018-06-281-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | This is an enhancement to D48401 that was discussed in: https://bugs.llvm.org/show_bug.cgi?id=37806 We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other direction because that's generally better of course). This allows us to remove the shuffle as we do in the regular opcodes-are-the-same cases. This requires a small hack to make sure we don't introduce any extra poison: https://rise4fun.com/Alive/ZGv Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There are planned enhancements for opcode transforms such or -> add. Note that there's a different fold needed if we've already managed to simplify away a binop as seen in the test based on PR37806, but we manage to get that one case here because this fold is positioned above the demanded elements fold currently. Differential Revision: https://reviews.llvm.org/D48485 llvm-svn: 335888
* [SCCP] Mark CFG as preserved.Florian Hahn2018-06-281-0/+36
| | | | | | | | | | | | SCCP does not change the CFG, so we can mark it as preserved. Reviewers: dberlin, efriedma, davide Reviewed By: davide Differential Revision: https://reviews.llvm.org/D47149 llvm-svn: 335820
* [IndVarSimplify] Ignore unreachable users of truncsMax Kazantsev2018-06-281-0/+47
| | | | | | | If a trunc has a user in a block which is not reachable from entry, we can safely perform trunc elimination as if this user didn't exist. llvm-svn: 335816
* [InstCombine] add tests for vector-select-of-binops with 2 variables; NFCSanjay Patel2018-06-271-0/+263
| | | | llvm-svn: 335778
* [InstCombine] add more tests for shuffle with different binops; NFCSanjay Patel2018-06-271-0/+36
| | | | llvm-svn: 335756
* [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics ↵Craig Topper2018-06-271-36/+36
| | | | | | | | that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744
OpenPOWER on IntegriCloud