summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Analysis
Commit message (Collapse)AuthorAgeFilesLines
* [MemorySSA] Make the use of moveAllAfterMergeBlocks consistent.Alina Sbirlea2019-10-091-15/+22
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The rule for the moveAllAfterMergeBlocks API si for all instructions from `From` to have been moved to `To`, while keeping the CFG edges (and block terminators) unchanged. Update all the callsites for moveAllAfterMergeBlocks to follow this. Pending follow-up: since the same behavior is needed everytime, merge all callsites into one. The common denominator may be the call to `MergeBlockIntoPredecessor`. Resolves PR43569. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68659 llvm-svn: 374177
* Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure ↵Jinsong Ji2019-10-081-10/+2
| | | | | | | | | | | | | | separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a. This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091
* [SVE][IR] Scalable Vector size queries and IR instruction supportGraham Hunter2019-10-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | * Adds a TypeSize struct to represent the known minimum size of a type along with a flag to indicate that the runtime size is a integer multiple of that size * Converts existing size query functions from Type.h and DataLayout.h to return a TypeSize result * Adds convenience methods (including a transparent conversion operator to uint64_t) so that most existing code 'just works' as if the return values were still scalars. * Uses the new size queries along with ElementCount to ensure that all supported instructions used with scalable vectors can be constructed in IR. Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen Reviewed By: rovka, sdesmalen Differential Revision: https://reviews.llvm.org/D53137 llvm-svn: 374042
* [LoopVectorize][PowerPC] Estimate int and float register pressure separately ↵Zi Xuan Wu2019-10-081-2/+10
| | | | | | | | | | | | | | | | | | | | | | | in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017
* Second attempt to add iterator_range::empty()Jordan Rose2019-10-071-1/+1
| | | | | | | | | | | | Doing this makes MSVC complain that `empty(someRange)` could refer to either C++17's std::empty or LLVM's llvm::empty, which previously we avoided via SFINAE because std::empty is defined in terms of an empty member rather than begin and end. So, switch callers over to the new method as it is added. https://reviews.llvm.org/D68439 llvm-svn: 373935
* Revert "[SLP] avoid reduction transform on patterns that the backend can ↵Martin Storsjo2019-10-071-53/+0
| | | | | | | | | | load-combine" This reverts SVN r373833, as it caused a failed assert "Non-zero loop cost expected" on building numerous projects, see PR43582 for details and reproduction samples. llvm-svn: 373882
* [LOOPGUARD] Remove asserts in getLoopGuardBranchWhitney Tsang2019-10-061-3/+9
| | | | | | | | | | | | | Summary: The assertion in getLoopGuardBranch can be a 'return nullptr' under if condition. Authored By: DTharun Reviewer: Whitney, fhahn Reviewed By: Whitney, fhahn Subscribers: fhahn, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D66084 llvm-svn: 373857
* [SLP] avoid reduction transform on patterns that the backend can load-combineSanjay Patel2019-10-051-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a scalar cost model adjustment with a conservative pattern match and cost summation for a multi-instruction sequence that can probably be reduced later. This should prevent SLP from creating a vector reduction unless that sequence is extremely cheap. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 373833
* [MemorySSA] Update Phi creation when inserting a Def.Alina Sbirlea2019-10-021-37/+40
| | | | | | | | | MemoryPhis should be added in the IDF of the blocks newly gaining Defs. This includes the blocks that gained a Phi and the block gaining a Def, if the block did not have one before. Resolves PR43427. llvm-svn: 373505
* Fix: Actually erase remove the elements from AssumeHandlesAditya Kumar2019-10-021-1/+4
| | | | | | | | | | | | | | Reviewers: sdmitriev, tejohnson Reviewed by: tejohnson Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68318 llvm-svn: 373494
* MemorySSAUpdater::applyInsertUpdates - silence static analyzer ↵Simon Pilgrim2019-10-021-1/+1
| | | | | | | | dyn_cast<MemoryAccess> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<MemoryAccess> directly and if not assert will fire for us. llvm-svn: 373467
* MemorySSA tryOptimizePhi - assert that we've found a DefChainEnd. NFCI.Simon Pilgrim2019-10-021-0/+1
| | | | | | Silences static analyzer null dereference warning. llvm-svn: 373466
* LoopAccessAnalysis isConsecutiveAccess() - silence static analyzer ↵Simon Pilgrim2019-10-021-2/+2
| | | | | | | | dyn_cast<SCEVConstant> null dereference warning. NFCI. The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<SCEVConstant> directly and if not assert will fire for us. llvm-svn: 373465
* [InstCombine] Simplify fma multiplication to nan for undef or nan operands.Florian Hahn2019-10-021-3/+3
| | | | | | | | | | | | | | | | | In similar fashion to D67721, we can simplify FMA multiplications if any of the operands is NaN or undef. In instcombine, we will simplify the FMA to an fadd with a NaN operand, which in turn gets folded to NaN. Note that this just changes SimplifyFMAFMul, so we still not catch the case where only the Add part of the FMA is Nan/Undef. Reviewers: cameron.mcinally, mcberg2017, spatel, arsenm Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D68265 llvm-svn: 373459
* [InstSimplify] fold fma/fmuladd with a NaN or undef operandSanjay Patel2019-10-021-0/+9
| | | | | | | | | | | This is intended to be similar to the constant folding results from D67446 and earlier, but not all operands are constant in these tests, so the responsibility for folding is left to InstSimplify. Differential Revision: https://reviews.llvm.org/D67721 llvm-svn: 373455
* Remove an unnecessary cast. NFC.Jay Foad2019-10-021-3/+2
| | | | llvm-svn: 373434
* [DDG] Data Dependence Graph - Root NodeBardia Mahjour2019-10-012-1/+51
| | | | | | | | | | | | | | | | | | | Summary: This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tag: #llvm Differential Revision: https://reviews.llvm.org/D67970 llvm-svn: 373386
* [MemorySSA] Check for unreachable blocks when getting last definition.Alina Sbirlea2019-10-011-0/+3
| | | | | | | | If a single predecessor is found, still check if the block is unreachable. The test that found this had a self loop unreachable block. Resolves PR43493. llvm-svn: 373383
* [MemorySSA] Update last_access_in_block check.Alina Sbirlea2019-10-011-2/+7
| | | | | | | | | | The check for "was there an access in this block" should be: is the last access in this block and is it not a newly inserted phi. Resolves new test in PR43438. Also fix a typo when simplifying trivial Phis to match the comment. llvm-svn: 373380
* [NFC][HardwareLoops] Update some iteratorsSam Parker2019-10-011-11/+6
| | | | llvm-svn: 373309
* [ConstantFolding] Fold constant calls to log2()Evandro Menezes2019-09-301-0/+9
| | | | | | | | Somehow, folding calls to `log2()` with a constant was missing. Differential revision: https://reviews.llvm.org/D67300 llvm-svn: 373262
* Revert "[SCEV] add no wrap flag for SCEVAddExpr."Tim Northover2019-09-301-1/+1
| | | | | | | | This reverts r366419 because the analysis performed is within the context of the loop and it's only valid to add wrapping flags to "global" expressions if they're always correct. llvm-svn: 373184
* [InstSimplify] generalize FP folds with undef/NaN; NFCSanjay Patel2019-09-271-12/+14
| | | | | | We can reuse this logic for things like fma. llvm-svn: 373119
* [Alignment][NFC] Remove unneeded llvm:: scoping on Align typesGuillaume Chatelet2019-09-272-6/+5
| | | | llvm-svn: 373081
* Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicatedWei Mi2019-09-271-7/+0
| | | | | | | | | | | exits" Get a better approach in https://reviews.llvm.org/D68107 to solve the problem. Revert the initial patch and will commit the new one soon. This reverts commit rL372990. llvm-svn: 373044
* [LOOPGUARD] Disable loop with multiple loop exiting blocks.Whitney Tsang2019-09-261-8/+6
| | | | | | | | | | | | | | | | | | | | Summary: As discussed in the loop group meeting. With the current definition of loop guard, we should not allow multiple loop exiting blocks. For loops that has multiple loop exiting blocks, we can simply unable to find the loop guard. When getUniqueExitBlock() obtains a vector size not equals to one, that means there is either no exit blocks or there exists more than one unique block the loop exit to. If we don't disallow loop with multiple loop exit blocks, then with our current implementation, there can exist exit blocks don't post dominated by the non pre-header successor of the guard block. Reviewer: reames, Meinersbur, kbarton, etiotto, bmahjour Reviewed By: Meinersbur, kbarton Subscribers: fhahn, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D66529 llvm-svn: 373011
* ConstantFold - silence static analyzer dyn_cast<ExtractValueInst> null ↵Simon Pilgrim2019-09-261-1/+1
| | | | | | | | dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<ExtractValueInst> directly and if not assert will fire for us. llvm-svn: 372993
* [LoopInfo] Limit the iterations to check whether a loop has dedicated exitsWei Mi2019-09-261-0/+7
| | | | | | | | | | | | | | for extreme large case. We had a case that a single loop which has 4000 exits and the average number of predecessors of each exit is > 1000, and we found compiling the case spent a significant amount of time on checking whether a loop has dedicated exits. This patch adds a limit for the iterations to the check. With the patch, the time to compile our testcase reduced from 1000s to 200s (clang release build). Differential Revision: https://reviews.llvm.org/D67359 llvm-svn: 372990
* Remove local shadow constant. NFCI.Simon Pilgrim2019-09-261-2/+0
| | | | | | ValueTracking.cpp already has a local static MaxDepth = 6 constant - this one seems to have been missed when rL124183 landed. llvm-svn: 372964
* [ValueTracking] Silence static analyzer dyn_cast<Operator> null dereference ↵Simon Pilgrim2019-09-261-225/+228
| | | | | | | | warnings. NFCI. The static analyzer is warning about a potential null dereferences, but since the pointer is only used in a switch statement for Operator::getOpcode() (with an empty default) then its easiest just to wrap this in a null test as the dyn_cast might return null here. llvm-svn: 372962
* [ConstantFolding] Use FoldBitCast correctlyKeno Fischer2019-09-261-2/+20
| | | | | | | | | | | | | Previously we might attempt to use a BitCast to turn bits into vectors of pointers, but that requires an inttoptr cast to be legal. Add an assertion to detect the formation of illegal bitcast attempts early (in the tests, we often constant-fold away the result before getting to this assertion check), while being careful to still handle the early-return conditions without adding extra complexity in the result. Patch by Jameson Nash <jameson@juliacomputing.com>. Differential Revision: https://reviews.llvm.org/D65057 llvm-svn: 372940
* [MemorySSA] Avoid adding Phis in the presence of unreachable blocks.Alina Sbirlea2019-09-251-45/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If a block has all incoming values with the same MemoryAccess (ignoring incoming values from unreachable blocks), then use that incoming MemoryAccess and do not create a Phi in the first place. Revert IDF work-around added in rL372673; it should not be required unless the Def inserted is the first in its block. The patch also cleans up a series of tests, added during the many iterations on insertDef. The patch also fixes PR43438. The same issue that occurs in insertDef with "adding phis, hence the IDF of Phis is needed", can also occur in fixupDefs: the `getPreviousRecursive` call only adds Phis walking on the predecessor edges, which means there may be the case of a Phi added walking the CFG "backwards" which triggers the needs for an additional Phi in successor blocks. Such Phis are added during fixupDefs only in the presence of unreachable blocks. Hence this highlights the need to avoid adding Phis in blocks with unreachable predecessors in the first place. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67995 llvm-svn: 372932
* [InstSimplify] Handle more 'A </>/>=/<= B &&/|| (A - B) !=/== 0' patterns ↵Roman Lebedev2019-09-251-0/+12
| | | | | | | | | | | (PR43251) https://rise4fun.com/Alive/sl9s https://rise4fun.com/Alive/2plN https://bugs.llvm.org/show_bug.cgi?id=43251 llvm-svn: 372928
* [InstSimplify] Match 1.0 and 0.0 for both operands in SimplifyFMAMulFlorian Hahn2019-09-251-0/+8
| | | | | | | | | | | | | | | | | | Because we do not constant fold multiplications in SimplifyFMAMul, we match 1.0 and 0.0 for both operands, as multiplying by them is guaranteed to produce an exact result (if it is allowed to do so). Note that it is not enough to just swap the operands to ensure a constant is on the RHS, as we want to also cover the case with 2 constants. Reviewers: lebedev.ri, spatel, reames, scanon Reviewed By: lebedev.ri, reames Differential Revision: https://reviews.llvm.org/D67553 llvm-svn: 372915
* [InstCombine] Limit FMul constant folding for fma simplifications.Florian Hahn2019-09-251-9/+20
| | | | | | | | | | | | | | | | | As @reames pointed out post-commit, rL371518 adds additional rounding in some cases, when doing constant folding of the multiplication. This breaks a guarantee llvm.fma makes and must be avoided. This patch reapplies rL371518, but splits off the simplifications not requiring rounding from SimplifFMulInst as SimplifyFMAFMul. Reviewers: spatel, lebedev.ri, reames, scanon Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67434 llvm-svn: 372899
* [PatternMatch] Make m_Br more flexible, add matchers for BB values.Florian Hahn2019-09-251-2/+1
| | | | | | | | | | | | | | | | | | | | | Currently m_Br only takes references to BasicBlock*, which limits its flexibility. For example, you have to declare a variable, even if you ignore the result or you have to have additional checks to make sure the matched BB matches an expected one. This patch adds m_BasicBlock and m_SpecificBB matchers, which can be used like the existing matchers for constants or values. I also had a look at the existing uses and updated a few. IMO it makes the code a bit more explicit. Reviewers: spatel, craig.topper, RKSimon, majnemer, lebedev.ri Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D68013 llvm-svn: 372885
* [IR] allow fast-math-flags on phi of FP values (2nd try)Sanjay Patel2019-09-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changes here are based on the corresponding diffs for allowing FMF on 'select': D61917 <https://reviews.llvm.org/D61917> As discussed there, we want to have fast-math-flags be a property of an FP value because the alternative (having them on things like fcmp) leads to logical inconsistency such as: https://bugs.llvm.org/show_bug.cgi?id=38086 The earlier patch for select made almost no practical difference because most unoptimized conditional code begins life as a phi (based on what I see in clang). Similarly, I don't expect this patch to do much on its own either because SimplifyCFG promptly drops the flags when converting to select on a minimal example like: https://bugs.llvm.org/show_bug.cgi?id=39535 But once we have this plumbing in place, we should be able to wire up the FMF propagation and start solving cases like that. The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a regression in a LoopVectorize test. We are intersecting the FMF of any FPMathOperator there, so if a phi is not properly annotated, new math instructions may not be either. Once we fix the propagation in SimplifyCFG, it may be safe to remove that hack. Differential Revision: https://reviews.llvm.org/D67564 llvm-svn: 372878
* Revert [IR] allow fast-math-flags on phi of FP valuesSanjay Patel2019-09-251-2/+1
| | | | | | This reverts r372866 (git commit dec03223a97af0e4dfcb23da55c0f7f8c9b62d00) llvm-svn: 372868
* [IR] allow fast-math-flags on phi of FP valuesSanjay Patel2019-09-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changes here are based on the corresponding diffs for allowing FMF on 'select': D61917 As discussed there, we want to have fast-math-flags be a property of an FP value because the alternative (having them on things like fcmp) leads to logical inconsistency such as: https://bugs.llvm.org/show_bug.cgi?id=38086 The earlier patch for select made almost no practical difference because most unoptimized conditional code begins life as a phi (based on what I see in clang). Similarly, I don't expect this patch to do much on its own either because SimplifyCFG promptly drops the flags when converting to select on a minimal example like: https://bugs.llvm.org/show_bug.cgi?id=39535 But once we have this plumbing in place, we should be able to wire up the FMF propagation and start solving cases like that. The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a regression in a LoopVectorize test. We are intersecting the FMF of any FPMathOperator there, so if a phi is not properly annotated, new math instructions may not be either. Once we fix the propagation in SimplifyCFG, it may be safe to remove that hack. Differential Revision: https://reviews.llvm.org/D67564 llvm-svn: 372866
* [SCEV] Disable canonical expansion for non-affine addrecs.Artur Pilipenko2019-09-241-1/+12
| | | | | | | | | | Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D65276 Patch by Evgeniy Brevnov (ybrevnov@azul.com) llvm-svn: 372789
* [PGO][PGSO] ProfileSummary changes.Hiroshi Yamauchi2019-09-241-0/+67
| | | | | | | | | | (Split of off D67120) ProfileSummary changes for profile guided size optimization. Differential Revision: https://reviews.llvm.org/D67377 llvm-svn: 372783
* [MemorySSA] Update Phi insertion.Alina Sbirlea2019-09-231-43/+39
| | | | | | | | | | | | | | | | | | | | | | Summary: MemoryPhis may be needed following a Def insertion inthe IDF of all the new accesses added (phis + potentially a def). Ensure this also occurs when only the new MemoryPhis are the defining accesses. Note: The need for computing IDF here is because of new Phis added with edges incoming from unreachable code, Phis that had previously been simplified. The preferred solution is to not reintroduce such Phis. This patch is the needed fix while working on the preferred solution. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67927 llvm-svn: 372673
* [ValueTracking] Fix uninitialized variable warnings in matchSelectPattern ↵Simon Pilgrim2019-09-231-5/+6
| | | | | | | | | | const wrapper. NFCI. Static analyzer complains about const_cast uninitialized variables, we should explicitly set these to null. Ideally that const wrapper would go away though....... llvm-svn: 372603
* [InstSimplify] simplifyUnsignedRangeCheck(): X >= Y && Y == 0 --> Y == 0Roman Lebedev2019-09-211-4/+3
| | | | | | https://rise4fun.com/Alive/v9Y4 llvm-svn: 372491
* [InstSimplify][NFC] Reorganize simplifyUnsignedRangeCheck() to emphasize ↵Roman Lebedev2019-09-211-8/+11
| | | | | | | | and/or symmetry Only a single `X >= Y && Y == 0 --> Y == 0` fold appears to be missing. llvm-svn: 372490
* [Inliner] Remove incorrect early exit during switch cost computationTeresa Johnson2019-09-201-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The CallAnalyzer::visitSwitchInst has an early exit when the estimated lower bound of the switch cost will put the overall cost of the inline above the threshold. However, this code is not correctly estimating the lower bound for switches that can be transformed into bit tests, leading to unnecessary lost inlines, and also differing behavior with optimization remarks enabled. First, the early exit is controlled by whether ComputeFullInlineCost is enabled or not, and that in turn is disabled by default but enabled when enabling -pass-remarks=missed. This by itself wouldn't lead to a problem, except that as described below, the lower bound can be above the real lower bound, so we can sometimes get different inline decisions with inline remarks enabled, which is problematic. The early exit was added in along with a new switch cost model in D31085. The reason why this early exit was added is due to a concern one reviewer raised about compile time for large switches: https://reviews.llvm.org/D31085?id=94559#inline-276200 However, the code just below there calls getEstimatedNumberOfCaseClusters, which in turn immediately calls BasicTTIImpl getEstimatedNumberOfCaseClusters, which in the worst case does a linear scan of the cases to get the high and low values. The bit test handling in particular is guarded by whether the number of cases fits into the max bit width. There is no suggestion that anyone measured a compile time issue, it appears to be theoretical. The problem is that the reviewer's comment about the lower bound calculation is incorrect, specifically in the case of a switch that can be lowered to a bit test. This isn't followed up on the comment thread, but the author does add a FIXME to that effect above the early exit added when they subsequently revised the patch. As a result, we were incorrectly early exiting and not inlining functions with switch statements that would be lowered to bit tests in cases where we were nearing the threshold. Combined with the fact that this early exit was skipped with opt remarks enabled, this caused different inlining decisions to be made when -pass-remarks=missed is enabled to debug the missing inline. Remove the early exit for the above reasons. I also copied over an existing AArch64 inlining test to X86, and adjusted the threshold so that the bit test inline only occurs with the fix in this patch. Reviewers: davidxl Subscribers: eraman, kristof.beyls, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67716 llvm-svn: 372440
* [Analysis] Allow -scalar-evolution-max-iterations more than onceShoaib Meenai2019-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | At present, `-scalar-evolution-max-iterations` is a `cl::Optional` option, which means it demands to be passed exactly zero or one times. Our build system makes it pretty tricky to guarantee this. We often accidentally pass the flag more than once (but always with the same value) which results in an error, after which compilation fails: ``` clang (LLVM option parsing): for the -scalar-evolution-max-iterations option: may only occur zero or one times! ``` It seems reasonable to allow -scalar-evolution-max-iterations to be passed more than once. Quoting the [[ http://llvm.org/docs/CommandLine.html#controlling-the-number-of-occurrences-required-and-allowed | documentation ]]: > The cl::ZeroOrMore modifier ... indicates that your program will allow the option to be specified zero or more times. > ... > If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained. Original patch by: Enrico Bern Hardy Tanuwidjaja <etanuwid@fb.com> Differential Revision: https://reviews.llvm.org/D67512 llvm-svn: 372346
* [SVFS] Vector Function ABI demangling.Francesco Petrogalli2019-09-192-0/+419
| | | | | | | | | | | | | | | | This patch implements the demangling functionality as described in the Vector Function ABI. This patch will be used to implement the SearchVectorFunctionSystem (SVFS) as described in the RFC: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html A fuzzer is added to test the demangling utility. Patch by Sumedh Arani <sumedh.arani@arm.com> Differential revision: https://reviews.llvm.org/D66024 llvm-svn: 372343
* Data Dependence Graph BasicsBardia Mahjour2019-09-183-0/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper: D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS. This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges. The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored. The algorithm for building the graph involves the following steps in order: 1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph. 2. For each node in the graph establish def-use edges to/from other nodes in the graph. 3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur, fhahn, myhsu Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto Tag: #llvm Differential Revision: https://reviews.llvm.org/D65350 llvm-svn: 372238
* [SDA] Don't stop divergence propagation at the IPD.Jay Foad2019-09-181-35/+26
| | | | | | | | | | | | | | | | | | | Summary: This fixes B42473 and B42706. This patch makes the SDA propagate branch divergence until the end of the RPO traversal. Before, the SyncDependenceAnalysis propagated divergence only until the IPD in rpo order. RPO is incompatible with post dominance in the presence of loops. This made the SDA crash because blocks were missed in the propagation. Reviewers: foad, nhaehnle Reviewed By: foad Subscribers: jvesely, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65274 llvm-svn: 372223
OpenPOWER on IntegriCloud