summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [DebugInfo][LoopVectorize] Preserve DL in generated phi instructionAnastasis Grammenos2018-07-041-0/+2
| | | | | | | | | When creating `phi` instructions to resume at the scalar part of the loop, copy the DebugLoc from the original phi over to the new one. Differential Revision: https://reviews.llvm.org/D48769 llvm-svn: 336256
* [DebugInfo][InstCombine] Preserve DI after combining zextAnastasis Grammenos2018-07-041-0/+11
| | | | | | | | | | | When zext is EvaluatedInDifferentType, InstCombine drops the dbg.value intrinsic. This patch tries to preserve said DI, by inserting the zext's old DI in the resulting instruction. (Only for integer type for now) Differential Revision: https://reviews.llvm.org/D48331 llvm-svn: 336254
* [InstCombine] fold shuffle-with-binop and common valueSanjay Patel2018-07-031-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | This is the last significant change suggested in PR37806: https://bugs.llvm.org/show_bug.cgi?id=37806#c5 ...though there are several follow-ups noted in the code comments in this patch to complete this transform. It's possible that a binop feeding a select-shuffle has been eliminated by earlier transforms (or the code was just written like this in the 1st place), so we'll fail to match the patterns that have 2 binops from: D48401, D48678, D48662, D48485. In that case, we can try to materialize identity constants for the remaining binop to fill in the "ghost" lanes of the vector (where we just want to pass through the original values of the source operand). I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups. For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor). Differential Revision: https://reviews.llvm.org/D48830 llvm-svn: 336196
* [DebugInfo] Corrections for salvageDebugInfoBjorn Pettersson2018-07-031-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When salvaging a dbg.declare/dbg.addr we should not add DW_OP_stack_value to the DIExpression (see test/Transforms/InstCombine/salvage-dbg-declare.ll). Consider this example %vla = alloca i32, i64 2 call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression()) Instcombine will turn it into %vla1 = alloca [2 x i32] %vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0 call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression()) If the GEP can be eliminated, then the dbg.declare will be salvaged and we should get %vla1 = alloca [2 x i32] call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression()) The problem was that salvageDebugInfo did not recognize dbg.declare as being indirect (%vla1 points to the value, it does not hold the value), so we incorrectly got call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value)) I also made sure that llvm::salvageDebugInfo and DIExpression::prependOpcodes do not add DW_OP_stack_value to the DIExpression in case no new operands are added to the DIExpression. That way we avoid to, unneccessarily, turn a register location expression into an implicit location expression in some situations (see test11 in test/Transforms/LICM/sinking.ll). Reviewers: aprantl, vsk Reviewed By: aprantl, vsk Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D48837 llvm-svn: 336191
* [PM/LoopUnswitch] Fix PR37651 by correctly invalidating SCEV whenChandler Carruth2018-07-031-21/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unswitching loops. Original patch trying to address this was sent in D47624, but that didn't quite handle things correctly. There are two key principles used to select whether and how to invalidate SCEV-cached information about loops: 1) We must invalidate any info SCEV has cached before unswitching as we may change (or destroy) the loop structure by the act of unswitching, and make it hard to recover everything we want to invalidate within SCEV. 2) We need to invalidate all of the loops whose CFGs are mutated by the unswitching. Notably, this isn't the *entire* loop nest, this is every loop contained by the outermost loop reached by an exit block relevant to the unswitch. And we need to do this even when doing trivial unswitching. I've added more focused tests that directly check that SCEV starts off with imprecise information and after unswitching (and simplifying instructions) re-querying SCEV will produce precise information. These tests also specifically work to check that an *outer* loop's information becomes precise. However, the testing here is still a bit imperfect. Crafting test cases that reliably fail to be analyzed by SCEV before unswitching and succeed afterward proved ... very, very hard. It took me several hours and careful work to build these, and I'm not optimistic about necessarily coming up with more to cover more elaborate possibilities. Fortunately, the code pattern we are testing here in the pass is really straightforward and reliable. Thanks to Max Kazantsev for the initial work on this as well as the review, and to Hal Finkel for helping me talk through approaches to test this stuff even if it didn't come to much. Differential Revision: https://reviews.llvm.org/D47624 llvm-svn: 336183
* [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are doneMax Kazantsev2018-07-031-3/+7
| | | | | | | | | | | | This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri llvm-svn: 336172
* Replace "Replacable" with "Replaceable". [NFC]Alina Sbirlea2018-07-021-13/+13
| | | | llvm-svn: 336133
* [SLP] Recognize min/max pattern using instructions producing same values.Farhana Aleen2018-07-021-0/+71
| | | | | | | | | | | | | | | | | | | Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization. %1 = extractelement <2 x i32> %a, i32 0 %2 = extractelement <2 x i32> %a, i32 1 %cond = icmp sgt i32 %1, %2 %3 = extractelement <2 x i32> %a, i32 0 %4 = extractelement <2 x i32> %a, i32 1 %select = select i1 %cond, i32 %3, i32 %4 Author: FarhanaAleen Reviewed By: ABataev, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D47608 llvm-svn: 336130
* [InstCombine] reverse canonicalization of add --> or to allow more shuffle ↵Sanjay Patel2018-07-021-12/+55
| | | | | | | | | | | | | | | | folding This extends D48485 to allow another pair of binops (add/or) to be combined either with or without a leading shuffle: or X, C --> add X, C (when X and C have no common bits set) Here, we need value tracking to determine that the 'or' can be reversed into an 'add', and we've added general infrastructure to allow extending to other opcodes or moving to where other passes could use that functionality. Differential Revision: https://reviews.llvm.org/D48662 llvm-svn: 336128
* [SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector ↵Simon Pilgrim2018-07-021-6/+0
| | | | | | | | getEntryCost This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure. llvm-svn: 336102
* Recommit r328307: [IPSCCP] Use constant range information for comparisons of ↵Florian Hahn2018-07-021-111/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | parameters. This version contains a fix to add values for which the state in ParamState change to the worklist if the state in ValueState did not change. To avoid adding the same value multiple times, mergeInValue returns true, if it added the value to the worklist. The value is added to the worklist depending on its state in ValueState. Original message: For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 336098
* [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct ↵Simon Pilgrim2018-07-021-4/+3
| | | | | | | | | | handle SK_Select patterns. We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case. This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now... llvm-svn: 336095
* [SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for ↵Simon Pilgrim2018-07-021-1/+5
| | | | | | | | getEntryCost/vectorizeTree. NFCI. Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators. llvm-svn: 336092
* [SLPVectorizer] Call InstructionsState.isOpcodeOrAlt with Instruction ↵Simon Pilgrim2018-07-011-11/+9
| | | | | | instead of an opcode. NFCI. llvm-svn: 336069
* [SLPVectorizer] Replace sameOpcodeOrAlt with InstructionsState.isOpcodeOrAlt ↵Simon Pilgrim2018-07-011-12/+10
| | | | | | | | helper. NFCI. This is a basic step towards matching more general instructions types than just opcodes. llvm-svn: 336068
* [SLPVectorizer] Use InstructionsState Op/Alt opcodes directly. NFCI.Simon Pilgrim2018-07-011-4/+2
| | | | llvm-svn: 336063
* [UnrollAndJam] New Unroll and Jam passDavid Green2018-07-018-20/+1260
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062
* [Evaluator] Improve evaluation of call instructionEugene Leviant2018-07-011-7/+62
| | | | | | Recommit of r335324 after buildbot failure fix llvm-svn: 336059
* [instsimplify] Move the instsimplify pass to use more obvious file namesChandler Carruth2018-06-296-55/+48
| | | | | | | | | | | | | | | | and diretory. Also cleans up all the associated naming to be consistent and removes the public access to the pass ID which was unused in LLVM. Also runs clang-format over parts that changed, which generally cleans up a bunch of formatting. This is in preparation for doing some internal cleanups to the pass. Differential Revision: https://reviews.llvm.org/D47352 llvm-svn: 336028
* [HWASan] Do not retag allocas before return from the function.Alex Shlyapnikov2018-06-291-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Retagging allocas before returning from the function might help detecting use after return bugs, but it does not work at all in real life, when instrumented and non-instrumented code is intermixed. Consider the following code: F_non_instrumented() { T x; F1_instrumented(&x); ... } { F_instrumented(); F_non_instrumented(); } - F_instrumented call leaves the stack below the current sp tagged randomly for UAR detection - F_non_instrumented allocates its own vars on that tagged stack, not generating any tags, that is the address of x has tag 0, but the shadow memory still contains tags left behind by F_instrumented on the previous step - F1_instrumented verifies &x before using it and traps on tag mismatch, 0 vs whatever tag was set by F_instrumented Reviewers: eugenis Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D48664 llvm-svn: 336011
* Revert "Extend CFGPrinter and CallPrinter with Heat Colors"Sean Fertile2018-06-291-1/+1
| | | | | | This reverts r335996 which broke graph printing in Polly. llvm-svn: 336000
* Extend CFGPrinter and CallPrinter with Heat ColorsSean Fertile2018-06-291-1/+1
| | | | | | | | | | | | | | | Extends the CFGPrinter and CallPrinter with heat colors based on heuristics or profiling information. The colors are enabled by default and can be toggled on/off for CFGPrinter by using the option -cfg-heat-colors for both -dot-cfg[-only] and -view-cfg[-only]. Similarly, the colors can be toggled on/off for CallPrinter by using the option -callgraph-heat-colors for both -dot-callgraph and -view-callgraph. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D40425 llvm-svn: 335996
* [InstCombine] enhance shuffle-of-binops to allow different variable ops ↵Sanjay Patel2018-06-291-12/+38
| | | | | | | | | | | | | | | | | | | | | | | (PR37806) This was discussed in D48401 as another improvement for: https://bugs.llvm.org/show_bug.cgi?id=37806 If we have 2 different variable values, then we shuffle (select) those lanes, shuffle (select) the constants, and then perform the binop. This eliminates a binop. The new shuffle uses the same shuffle mask as the existing shuffle, so there's no danger of creating a difficult shuffle. All of the earlier constraints still apply, but we also check for extra uses to avoid creating more instructions than we'll remove. Additionally, we're disallowing the fold for div/rem because that could expose a UB hole. Differential Revision: https://reviews.llvm.org/D48678 llvm-svn: 335974
* [InstCombine] fix opcode check in shuffle foldSanjay Patel2018-06-281-1/+1
| | | | | | | | | There's no way to expose this difference currently, but we should use the updated variable because the original opcodes can go stale if we transform into something new. llvm-svn: 335920
* [ThinLTO] Port InlinerFunctionImportStats handling to new PMTeresa Johnson2018-06-281-0/+18
| | | | | | | | | | | | | | Summary: The InlinerFunctionImportStats will collect and dump stats regarding how many function inlined into the module were imported by ThinLTO. Reviewers: wmi, dexonsmith Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D48729 llvm-svn: 335914
* [SROA] Preserve DebugLoc when rewriting alloca partitionsAnastasis Grammenos2018-06-281-0/+2
| | | | | | | | | When rewriting an alloca partition copy the DL from the old alloca over the the new one. Differential Revision: https://reviews.llvm.org/D48640 llvm-svn: 335904
* [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)Sanjay Patel2018-06-281-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | This is an enhancement to D48401 that was discussed in: https://bugs.llvm.org/show_bug.cgi?id=37806 We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other direction because that's generally better of course). This allows us to remove the shuffle as we do in the regular opcodes-are-the-same cases. This requires a small hack to make sure we don't introduce any extra poison: https://rise4fun.com/Alive/ZGv Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There are planned enhancements for opcode transforms such or -> add. Note that there's a different fold needed if we've already managed to simplify away a binop as seen in the test based on PR37806, but we manage to get that one case here because this fold is positioned above the demanded elements fold currently. Differential Revision: https://reviews.llvm.org/D48485 llvm-svn: 335888
* Revert "Add support for generating a call graph profile from Branch ↵Benjamin Kramer2018-06-282-101/+0
| | | | | | | | Frequency Info." This reverts commits r335794 and r335797. Breaks ThinLTO+FDO selfhost. llvm-svn: 335851
* Comment change to verify commit rights. NFC.Jesper Antonsson2018-06-281-1/+1
| | | | | | | | | | Summary: Just a silly one-character correction. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48709 llvm-svn: 335832
* [SCCP] Mark CFG as preserved.Florian Hahn2018-06-281-0/+2
| | | | | | | | | | | | SCCP does not change the CFG, so we can mark it as preserved. Reviewers: dberlin, efriedma, davide Reviewed By: davide Differential Revision: https://reviews.llvm.org/D47149 llvm-svn: 335820
* [IndVarSimplify] Ignore unreachable users of truncsMax Kazantsev2018-06-281-0/+4
| | | | | | | If a trunc has a user in a block which is not reachable from entry, we can safely perform trunc elimination as if this user didn't exist. llvm-svn: 335816
* [CGProfile] Fix unused variable warning.Michael J. Spencer2018-06-281-1/+1
| | | | llvm-svn: 335797
* Add support for generating a call graph profile from Branch Frequency Info.Michael J. Spencer2018-06-272-0/+101
| | | | | | | | | | | | | | | | | | | | | === Generating the CG Profile === The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight. After scanning all the functions, it generates an appending module flag containing the data. The format looks like: ``` !llvm.module.flags = !{!0} !0 = !{i32 5, !"CG Profile", !1} !1 = !{!2, !3, !4} ; List of edges !2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32 !3 = !{void (i1)* @freq, void ()* @a, i64 11} !4 = !{void (i1)* @freq, void ()* @b, i64 20} ``` Differential Revision: https://reviews.llvm.org/D48105 llvm-svn: 335794
* [ThinLTO] Print names in function import debug messages when availableTeresa Johnson2018-06-271-8/+15
| | | | | | | | | | | | | | | | Summary: Rather than just print the GUID, when it is available in the index, print the global name as well in the function import thin link debug messages. Names will be available when the combined index is being built by the same process, e.g. a linker or "llvm-lto2 run". Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits Differential Revision: https://reviews.llvm.org/D48612 llvm-svn: 335760
* [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics ↵Craig Topper2018-06-271-6/+6
| | | | | | | | that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744
* [InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms()Vedant Kumar2018-06-271-2/+5
| | | | | | | | | | | | | | This prevents InstCombine from creating mis-sized dbg.values when replacing a sequence of casts with a simpler cast. For example, in: (fptrunc (floor (fpext X))) -> (floorf X) We no longer emit dbg.value(X) (with a 32-bit float operand) to describe (fpext X) (which is a 64-bit float). This was diagnosed by the debugify check added in r335682. llvm-svn: 335696
* Revert "[asan] Instrument comdat globals on COFF targets"Evgeniy Stepanov2018-06-261-33/+8
| | | | | | Causes false positive ODR violation reports on __llvm_profile_raw_version. llvm-svn: 335681
* [JumpThreading] Don't try to rewrite a use if it's already valid.Michael Zolotukhin2018-06-261-10/+12
| | | | | | | | | | | | | | | | | | | Summary: When recording uses we need to rewrite after cloning a loop we need to check if the use is not dominated by the original def. The initial assumption was that the cloned basic block will introduce a new path and thus the original def will only dominate the use if they are in the same BB, but as the reproducer from PR37745 shows it's not always the case. This fixes PR37745. Reviewers: haicheng, Ka-Ka Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48111 llvm-svn: 335675
* Use a variable to appease a no-asserts bot, NFCVedant Kumar2018-06-261-0/+1
| | | | | | | Failure URL: http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/22836 llvm-svn: 335648
* LoopUnroll: Allow analyzing intrinsic call costsMatt Arsenault2018-06-261-2/+7
| | | | | | | | | | | I'm not sure why the code here is skipping calls since TTI does try to do something for general calls, but it at least should allow intrinsics. Skip intrinsics that should not be omitted as calls, which is by far the most common case on AMDGPU. llvm-svn: 335645
* [Local] Add a convenient insertReplacementDbgValues overload, NFCVedant Kumar2018-06-262-8/+14
| | | | | | | Add an overload for the common case where the replacement dbg.values have the same DIExpressions as the originals. llvm-svn: 335643
* [Local] Sink salvageDI's early exit into helper functions, NFCVedant Kumar2018-06-261-5/+12
| | | | | | | | salvageDebugInfo() performs a check that allows it to exit early without doing a DenseMap lookup. It's a bit neater and marginally more useful to sink this early exit into the findDbg{Addr,Users,Values} helpers. llvm-svn: 335642
* [InstCombine] simplify code for urem fold; NFCISanjay Patel2018-06-261-5/+2
| | | | llvm-svn: 335623
* [InstCombine] fold urem with sext bool divisorSanjay Patel2018-06-261-2/+13
| | | | | | | | | | | | | | | | | | | | | | Similar to other patches in this series: https://reviews.llvm.org/rL335512 https://reviews.llvm.org/rL335527 https://reviews.llvm.org/rL335597 https://reviews.llvm.org/rL335616 ...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform. I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold. Note that in this case, the backend might want to convert the select into math: Name: sext urem %e = sext i1 %x to i32 %r = urem i32 %y, %e => %c = icmp eq i32 %y, -1 %z = zext i1 %c to i32 %r = add i32 %z, %y llvm-svn: 335622
* [SLPVectorizer] Recognise non uniform power of 2 constantsSimon Pilgrim2018-06-261-12/+11
| | | | | | | | | | Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them. As SLP works with arrays of values I don't think we can easily use the pattern match helpers here. Differential Revision: https://reviews.llvm.org/D48214 llvm-svn: 335621
* [InstCombine] fold udiv with sext bool divisorSanjay Patel2018-06-261-1/+7
| | | | | | | | | | Note: I didn't add a hasOneUse() check because the existing, related fold doesn't have that check. I suspect that the improved analysis and codegen make these some of the rare canonicalization cases where we allow an increase in instructions. llvm-svn: 335597
* [IPSCCP] Change dead blocks to unreachable after visiting all executable blocks.Florian Hahn2018-06-261-3/+11
| | | | | | | | | | | | | | | | | changeToUnreachable may remove PHI nodes from executable blocks we found values for and we would fail to replace them. By changing dead blocks to unreachable after we replaced constants in all executable blocks, we ensure such PHI nodes are replaced by their known value before. Fixes PR37780. Reviewers: efriedma, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D48421 llvm-svn: 335588
* Improve ConvertDebugDeclareToDebugValueBjorn Pettersson2018-06-261-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a follow-up to r334830 and r335031. In the valueCoversEntireFragment check we now also handle the situation when there is a variable length array (VLA) involved, and the length of the array has been reduced to a constant. The ConvertDebugDeclareToDebugValue functions that are related to PHI nodes and load instructions now avoid inserting dbg.value intrinsics when the value does not, for certain, cover the variable/fragment that should be described. In r334830 we assumed that the value always covered the entire var/fragment and we had assertions in the code to show that assumption. However, those asserts failed when compiling code with VLAs, so we removed the asserts in r335031. Now when we know that the valueCoversEntireFragment check can fail also for PHI/Load instructions we avoid to insert the faulty dbg.value intrinsic in such situations. Compared to the Store instruction scenario we simply drop the dbg.value here (as the variable does not change its value due to PHI/Load, so an earlier dbg.value describing the variable should still be valid). Reviewers: aprantl, vsk, efriedma Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48547 llvm-svn: 335580
* [InstCombine] (A + 1) + (B ^ -1) --> A - BGil Rapaport2018-06-261-0/+5
| | | | | | | | | Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B). This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B. Differential Revision: https://reviews.llvm.org/D48535 llvm-svn: 335579
* [PM/LoopUnswitch] Teach the new unswitch to handle nontrivialChandler Carruth2018-06-251-156/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unswitching of switches. This works much like trivial unswitching of switches in that it reliably moves the switch out of the loop. Here we potentially clone the entire loop into each successor of the switch and re-point the cases at these clones. Due to the complexity of actually doing nontrivial unswitching, this patch doesn't create a dedicated routine for handling switches -- it would duplicate far too much code. Instead, it generalizes the existing routine to handle both branches and switches as it largely reduces to looping in a few places instead of doing something once. This actually improves the results in some cases with branches due to being much more careful about how dead regions of code are managed. With branches, because exactly one clone is created and there are exactly two edges considered, somewhat sloppy handling of the dead regions of code was sufficient in most cases. But with switches, there are much more complicated patterns of dead code and so I've had to move to a more robust model generally. We still do as much pruning of the dead code early as possible because that allows us to avoid even cloning the code. This also surfaced another problem with nontrivial unswitching before which is that we weren't as precise in reconstructing loops as we could have been. This seems to have been mostly harmless, but resulted in pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we have to get this *right*, and everything benefits from it. While the testing may seem a bit light here because we only have two real cases with actual switches, they do a surprisingly good job of exercising numerous edge cases. Also, because we share the logic with branches, most of the changes in this patch are reasonably well covered by existing tests. The new unswitch now has all of the same fundamental power as the old one with the exception of the single unsound case of *partial* switch unswitching -- that really is just loop specialization and not unswitching at all. It doesn't fit into the canonicalization model in any way. We can add a loop specialization pass that runs late based on profile data if important test cases ever come up here. Differential Revision: https://reviews.llvm.org/D47683 llvm-svn: 335553
OpenPOWER on IntegriCloud