summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Utils/LoopUtils.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Remove stale comment. NFC.Michael Kuperstein2016-12-031-3/+0
| | | | llvm-svn: 288572
* [LoopUnroll] Implement profile-based loop peelingMichael Kuperstein2016-11-301-7/+7
| | | | | | | | | | | | | | | | | | | This implements PGO-driven loop peeling. The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we can expect a performance win by peeling off the first several iterations of that loop. Unlike unrolling based on a known trip count, or a trip count multiple, this doesn't save us the conditional check and branch on each iteration. However, it does allow us to simplify the straight-line code we get (constant-folding, etc.). This is important given that we know that we will usually only hit this code, and not the actual loop. This is currently disabled by default. Differential Revision: https://reviews.llvm.org/D25963 llvm-svn: 288274
* Use profile info to adjust loop unroll threshold.Dehao Chen2016-11-171-0/+36
| | | | | | | | | | | | | | Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling. Reviewers: davidxl, mzolotukhin Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D26527 llvm-svn: 287186
* [LCSSA] Perform LCSSA verification only for the current loop nest.Igor Laevsky2016-10-281-0/+5
| | | | | | | | | Now LPPassManager will run LCSSA verification only for the top-level loop which was processed on the current iteration. Differential Revision: https://reviews.llvm.org/D25873 llvm-svn: 285394
* [LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis passAdam Nemet2016-08-261-4/+0
| | | | | | | | | | | | | | | | We can't mark ORE (a function pass) preserved as required by the loop passes because that is how we ensure that the required passes like LazyBFI are all available any time ORE is used. See the new comments in the patch. Instead we use it directly just like the inliner does in D22694. As expected there is some additional overhead after removing the caching provided by analysis passes. The worst case, I measured was LNT/CINT2006_ref/401.bzip2 which regresses by 12%. As before, this only affects -Rpass-with-hotness and not default compilation. llvm-svn: 279829
* Use the range variant of find/find_if instead of unpacking begin/endDavid Majnemer2016-08-121-1/+1
| | | | | | | | | If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278469
* Use range algorithms instead of unpacking begin/endDavid Majnemer2016-08-111-1/+1
| | | | | | No functionality change is intended. llvm-svn: 278417
* [LoopUnroll] Include hotness of region in opt remarkAdam Nemet2016-07-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter is added to the common function analysis passes that loop passes depend on. The BFI and indirectly BPI used in this pass is computed lazily so no overhead should be observed unless -pass-remarks-with-hotness is used. This is how the patch affects the O3 pipeline: Dominator Tree Construction Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Rotate Loops Loop Invariant Code Motion Unswitch loops Simplify the CFG Dominator Tree Construction Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Combine redundant instructions Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Induction Variable Simplification Recognize loop idioms Delete dead loops Unroll loops ... llvm-svn: 277203
* [LoopUtils] Sort headersAdam Nemet2016-07-261-3/+4
| | | | llvm-svn: 276776
* [Loop Vectorizer] Handling loops FP induction variables.Elena Demikhovsky2016-07-241-12/+106
| | | | | | | | | | | | | | | | Allowed loop vectorization with secondary FP IVs. Like this: float *A; float x = init; for (int i=0; i < N; ++i) { A[i] = x; x -= fp_inc; } The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute. Differential Revision: https://reviews.llvm.org/D21330 llvm-svn: 276554
* [LICM] Make isGuaranteedToExecute more accurate.Eli Friedman2016-06-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Make isGuaranteedToExecute use the isGuaranteedToTransferExecutionToSuccessor helper, and make that helper a bit more accurate. There's a potential performance impact here from assuming that arbitrary calls might not return. This probably has little impact on loads and stores to a pointer because most things alias analysis can reason about are dereferenceable anyway. The other impacts, like less aggressive hoisting of sdiv by a variable and less aggressive hoisting around volatile memory operations, are unlikely to matter for real code. This also impacts SCEV, which uses the same helper. It's a minor improvement there because we can tell that, for example, memcpy always returns normally. Strictly speaking, it's also introducing a bug, but it's not any worse than everywhere else we assume readonly functions terminate. Fixes http://llvm.org/PR27857. Reviewers: hfinkel, reames, chandlerc, sanjoy Subscribers: broune, llvm-commits Differential Revision: http://reviews.llvm.org/D21167 llvm-svn: 272489
* Move isGuaranteedToExecute out of LICM.Evgeniy Stepanov2016-06-101-0/+39
| | | | | | | Also rename LICMSafetyInfo to LoopSafetyInfo. Both will be used in LoopUnswitch in a separate change. llvm-svn: 272420
* [PM] Port LCSSA to the new PM.Easwaran Raman2016-06-091-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D21090 llvm-svn: 272294
* Vectorizer: track non-fast FP instructions through phis when finding reductions.Tim Northover2016-05-271-1/+1
| | | | | | | | When we traced through a phi node looking for floating-point reductions, we forgot whether we'd ever seen an instruction without fast-math flags (that would block vectorization). This propagates it through to the end. llvm-svn: 271015
* [LoopVectorize] Handling induction variable with non-constant step.Elena Demikhovsky2016-05-101-27/+65
| | | | | | | | | | | | | | | | | | | | | | | Allow vectorization when the step is a loop-invariant variable. This is the loop example that is getting vectorized after the patch: int int_inc; int bar(int init, int *restrict A, int N) { int x = init; for (int i=0;i<N;i++){ A[i] = x; x += int_inc; } return x; } "x" is an induction variable with *loop-invariant* step. But it is not a primary induction. Primary induction variable with non-constant step is not handled yet. Differential Revision: http://reviews.llvm.org/D19258 llvm-svn: 269023
* [LV] Identify more induction PHIs by coercing expressions to AddRecExprsSilviu Baranga2016-05-051-3/+30
| | | | | | | | | | | | | | | | | | Summary: Some PHIs can have expressions that are not AddRecExprs due to the presence of sext/zext instructions. In order to prevent the Loop Vectorizer from bailing out when encountering these PHIs, we now coerce the SCEV expressions to AddRecExprs using SCEV predicates (when possible). We only do this when the alternative would be to not vectorize. Reviewers: mzolotukhin, anemet Subscribers: mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17153 llvm-svn: 268633
* [LoopUtils] Extend findStringMetadataForLoop to return the value for metadataAdam Nemet2016-04-221-7/+18
| | | | | | | | | | | | | | | | E.g. for: !1 = {"llvm.distribute", i32 1} it now returns the MDOperand for 1. I will use this in LoopDistribution to check the value of the metadata. Note that the change is backward-compatible with its current use in LoopVersioningLICM. An Optional implicitly converts to a bool depending whether it contains a value or not. llvm-svn: 267190
* [LoopUtils] Fix typo in commentAdam Nemet2016-04-211-1/+1
| | | | llvm-svn: 267016
* [LoopUtils] Add asserts to findStringMetadataForLoop. NFCAdam Nemet2016-04-211-0/+5
| | | | | | | These ensure that operand array has at least one element and it is the self-reference. llvm-svn: 267015
* [LoopUtils] Move def of findStringMetadataForLoop to LoopUtils.cpp. NFCAdam Nemet2016-04-211-0/+22
| | | | | | | The decl is in LoopUtils.h. I think that this was added to LoopVersioningLICM.cpp by mistake. llvm-svn: 267014
* [LoopUtils, LV] Fix PR27246 (first-order recurrences)Matthew Simpson2016-04-111-1/+1
| | | | | | | | | | | | This patch ensures that when we detect first-order recurrences, we reject a phi node if its previous value is also a phi node. During vectorization the initial and previous values of the recurrence are shuffled together to create the value for the current iteration. However, phi nodes are not widened like other instructions. This fixes PR27246. Differential Revision: http://reviews.llvm.org/D18971 llvm-svn: 265983
* Remove HasFnAttribute guards to getFnAttribute callsNirav Dave2016-03-301-4/+2
| | | | | | | | | | | | These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872
* [LoopUtils, LV] Fix PR26734Matthew Simpson2016-03-031-1/+1
| | | | | | | | The vectorization of first-order recurrences (r261346) caused PR26734. When detecting these recurrences, we need to ensure that the previous value is actually defined inside the loop. This patch includes the fix and test case. llvm-svn: 262624
* [LV] Vectorize first-order recurrencesMatthew Simpson2016-02-191-0/+37
| | | | | | | | | | | | | | | | | | This patch enables the vectorization of first-order recurrences. A first-order recurrence is a non-reduction recurrence relation in which the value of the recurrence in the current loop iteration equals a value defined in the previous iteration. The load PRE of the GVN pass often creates these recurrences by hoisting loads from within loops. In this patch, we add a new recurrence kind for first-order phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction. Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org> Differential Revision: http://reviews.llvm.org/D16197 llvm-svn: 261346
* [LPM] Factor all of the loop analysis usage updates into a common helperChandler Carruth2016-02-191-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine. We were getting this wrong in small ways and generally being very inconsistent about it across loop passes. Instead, let's have a common place where we do this. One minor downside is that this will require some analyses like SCEV in more places than they are strictly needed. However, this seems benign as these analyses are complete no-ops, and without this consistency we can in many cases end up with the legacy pass manager scheduling deciding to split up a loop pass pipeline in order to run the function analysis half-way through. It is very, very annoying to fix these without just being very pedantic across the board. The only loop passes I've not updated here are ones that use AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer. They seemed less relevant. With this patch, almost all of the problems in PR24804 around loop pass pipelines are fixed. The one remaining issue is that we run simplify-cfg and instcombine in the middle of the loop pass pipeline. We've recently added some loop variants of these passes that would seem substantially cleaner to use, but this at least gets us much closer to the previous state. Notably, the seven loop pass managers is down to three. I've not updated the loop passes using LoopAccessAnalysis because that analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't clear that those transforms want to support those forms anyways. They all run late anyways, so this is harmless. Similarly, LSR is left alone because it already carefully manages its forms and doesn't need to get fused into a single loop pass manager with a bunch of other loop passes. LoopReroll didn't use loop simplified form previously, and I've updated the test case to match the trivially different output. Finally, I've also factored all the pass initialization for the passes that use this technique as well, so that should be done regularly and reliably. Thanks to James for the help reviewing and thinking about this stuff, and Ben for help thinking about it as well! Differential Revision: http://reviews.llvm.org/D17435 llvm-svn: 261316
* function names start with a lower case letter ; NFCSanjay Patel2016-01-121-1/+1
| | | | llvm-svn: 257496
* Revert r255115 until we figure out how to fix the bot failures.Silviu Baranga2015-12-091-43/+0
| | | | llvm-svn: 255117
* [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV ↵Silviu Baranga2015-12-091-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | expressions Summary: This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is that both LAA and LV should use this interface everywhere. This also solves a problem involving the result of SCEV expression rewritting when the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates P1: {a,+,b} has nsw P2: b = 1. Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies). The SCEVPredicatedLayer maintains the order of transformations by feeding back the results of previous transformations into new transformations, and therefore avoiding this issue. The SCEVPredicatedLayer maintains a cache to remember the results of previous SCEV rewritting results. This also has the benefit of reducing the overall number of expression rewrites. Reviewers: mzolotukhin, anemet Subscribers: jmolloy, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D14296 llvm-svn: 255115
* [Utils] Put includes in correct order. NFC.Weiming Zhao2015-11-241-3/+3
| | | | | | | | | | | | | | | | | | | Summary: Followed the guidelines in: http://llvm.org/docs/CodingStandards.html#include-style However, I noticed that uppercase named headers come before lowercase ones throughout the codebase. So kept them as is. Patch by Mandeep Singh Grang <mgrang@codeaurora.org> Reviewers: majnemer, davide, jmolloy, atrick Subscribers: sanjoy Differential Revision: http://reviews.llvm.org/D14939 llvm-svn: 254005
* [LoopUtils,LV] Propagate fast-math flags on generated FCmp instructionsJames Molloy2015-09-211-0/+7
| | | | | | | | | We're currently losing any fast-math flags when synthesizing fcmps for min/max reductions. In LV, make sure we copy over the scalar inst's flags. In LoopUtils, we know we only ever match patterns with hasUnsafeAlgebra, so apply that to any synthesized ops. llvm-svn: 248201
* [LV] Relax Small Size Reduction Type RequirementMatthew Simpson2015-09-101-6/+14
| | | | | | | | | | | | | This patch enables small size reductions in which the source types are smaller than the reduction type (e.g., computing an i16 sum from the values in an i8 array). The previous behavior was to only allow small size reductions if the source types and reduction type were the same. The change accounts for the fact that the existing sign- and zero-extend instructions in these cases should still be included in the cost model. Differential Revision: http://reviews.llvm.org/D12770 llvm-svn: 247337
* [LoopVectorize] Add Support for Small Size Reductions.Chad Rosier2015-08-271-14/+154
| | | | | | | | | | | | | | | | | | | | | | | Unlike scalar operations, we can perform vector operations on element types that are smaller than the native integer types. We type-promote scalar operations if they are smaller than a native type (e.g., i8 arithmetic is promoted to i32 arithmetic on Arm targets). This patch detects and removes type-promotions within the reduction detection framework, enabling the vectorization of small size reductions. In the legality phase, we look through the ANDs and extensions that InstCombine creates during promotion, keeping track of the smaller type. In the profitability phase, we use the smaller type and ignore the ANDs and extensions in the cost model. Finally, in the code generation phase, we truncate the result of the reduction to allow InstCombine to rewrite the entire expression in the smaller type. This fixes PR21369. http://reviews.llvm.org/D12202 Patch by Matt Simpson <mssimpso@codeaurora.org>! llvm-svn: 246149
* [LoopVectorize] Extract InductionInfo into a helper class...James Molloy2015-08-271-4/+56
| | | | | | | | ... and move it into LoopUtils where it can be used by other passes, just like ReductionDescriptor. The API is very similar to ReductionDescriptor - that is, not very nice at all. Sorting these both out will come in a followup. NFC llvm-svn: 246145
* Exposed findDefsUsedOutsideOfLoop as a loop utility functionAshutosh Nema2015-08-191-0/+19
| | | | | | | | | Exposed findDefsUsedOutsideOfLoop as a loop utility function by moving it from LoopDistribute to LoopUtils. Reviewed By: anemet llvm-svn: 245416
* Late evaluation of the fast-math vectorization requirement.Tyler Nowicki2015-08-101-4/+8
| | | | | | This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint. llvm-svn: 244489
* Refactor RecurrenceInstDescTyler Nowicki2015-06-161-44/+41
| | | | | | Moved RecurrenceInstDesc into RecurrenceDescriptor to simplify the namespaces. llvm-svn: 239862
* Rename Reduction variables/structures to Recurrence.Tyler Nowicki2015-06-161-66/+66
| | | | | | | | A reduction is a special kind of recurrence. In the loop vectorizer we currently identify basic reductions. Future patches will extend this to identifying basic recurrences. llvm-svn: 239835
* [LoopVectorize] Don't crash on zero-sized types in isInductionPHIDavid Majnemer2015-06-051-0/+3
| | | | | | | | | isInductionPHI wants to calculate the stride based on the pointee size. However, this is not possible when the pointee is zero sized. This fixes PR23763. llvm-svn: 239143
* Move common loop utility function isInductionPHI into LoopUtils.cppKarthik Bhat2015-04-231-0/+46
| | | | | | | This patch refactors the definition of common utility function "isInductionPHI" to LoopUtils.cpp. This fixes compilation error when configured with -DBUILD_SHARED_LIBS=ON llvm-svn: 235577
* [NFC] Refactor identification of reductions as common utility function.Karthik Bhat2015-04-201-0/+453
This patch refactors reduction identification code out of LoopVectorizer and exposes them as common utilities. No functional change. Review: http://reviews.llvm.org/D9046 llvm-svn: 235284
OpenPOWER on IntegriCloud