|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | We are seeing r276077 drastically increasing compiler time for our larger
benchmarks in PGO profile generation build (both clang based and IR based
mode) -- it can be 20x slower than without the patch (like from 30 secs to
780 secs)
The increased time are all in pass LCSSA. The problematic code is about
PostProcessPHIs after use-rewrite. Note that the InsertedPhis from ssa_updater
is accumulating (never been cleared). Since the inserted PHIs are added to the
candidate for each rewrite, The earlier ones will be repeatedly added. Later
when adding the new PHIs to the work-list, we don't check the duplication
either. This can result in extremely long work-list that containing tons of
duplicated PHIs.
This patch fixes the issue by hoisting the code out of the loop.
Differential Revision: http://reviews.llvm.org/D23344
llvm-svn: 278250 | 
| | 
| 
| 
| 
| 
| 
| 
| | Hal pointed out that the semantic of our intrinsic and the libc
call are slightly different. Add a comment while I'm here to
explain why we can't emit an intrinsic. Thanks Hal!
llvm-svn: 278200 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
This hopefully fixes PR28825. The problem now was that a value from the
original loop was used in a subloop, which became a sibling after separation.
While a subloop doesn't need an lcssa phi node, a sibling does, and that's
where we broke LCSSA. The most natural way to fix this now is to simply call
formLCSSA on the original loop: it'll do what we've been doing before plus
it'll cover situations described above.
I think we don't need to run formLCSSARecursively here, and we have an assert
to verify this (I've tried testing it on LLVM testsuite + SPECs). I'd be happy
to be corrected here though.
I also changed a run line in the test from '-lcssa -loop-unroll' to
'-lcssa -loop-simplify -indvars', because it exercises LCSSA
preservation to the same extent, but also makes less unrelated
transformation on the CFG, which makes it easier to verify.
Reviewers: chandlerc, sanjoy, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23288
llvm-svn: 278173 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278077 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Currently loop-unrolling doesn't preserve loop-simplified form. This patch
fixes it by resimplifying affected loops.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23148
llvm-svn: 278038 | 
| | 
| 
| 
| 
| | r278028: [MemorySSA] Ensure address stability of MemorySSA object.
llvm-svn: 278035 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Ensure that the MemorySSA object never changes address when using the
new pass manager since the walkers contained by MemorySSA cache pointers
to it at construction time.  This is achieved by wrapping the
MemorySSAAnalysis result in a unique_ptr.  Also add some asserts that
check for this bug.
Reviewers: george.burgess.iv, dberlin
Subscribers: mcrosier, hfinkel, chandlerc, silvas, llvm-commits
Differential Revision: https://reviews.llvm.org/D23171
llvm-svn: 278028 | 
| | 
| 
| 
| 
| 
| | Thanks to Mehdi for the suggestion!
llvm-svn: 277984 | 
| | 
| 
| 
| | llvm-svn: 277982 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
In the use optimizer, we need to keep of whether the lower bound still
dominates us or else we may decide a lower bound is still valid when it
is not due to intervening pushes/pops.  Fixes PR28880 (and probably a
bunch of other things).
Reviewers: george.burgess.iv
Subscribers: MatzeB, llvm-commits, sebpop
Differential Revision: https://reviews.llvm.org/D23237
llvm-svn: 277978 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
The correctness fix here is that when we CSE a load with another load,
we need to combine the metadata on the two loads. This matches the
behavior of other passes, like instcombine and GVN.
There's also a minor optimization improvement here: for load PRE, the
aliasing metadata on the inserted load should be the same as the
metadata on the original load. Not sure why the old code was throwing
it away.
Issue found by inspection.
Differential Revision: http://reviews.llvm.org/D21460
llvm-svn: 277977 | 
| | 
| 
| 
| | llvm-svn: 277972 | 
| | 
| 
| 
| 
| 
| | Differential Revision:  https://reviews.llvm.org/D22104
llvm-svn: 277963 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | loops.""
This reverts commit r277901. Reaaply the commit as it looks like it has
nothing to do with the bots failures.
llvm-svn: 277946 | 
| | 
| 
| 
| | llvm-svn: 277916 | 
| | 
| 
| 
| 
| 
| 
| | This reverts commit r277877.
Try to appease clang-x64-ninja-win7 buildbot.
llvm-svn: 277901 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Originally the plan was to use the custom worklist to do some block popping,
and because we don't actually need a visited set. The custom one we have
here is slightly broken, and it's not worth fixing vs using depth_first_iterator since we aren't going to go the route we originally
were.
Fixes PR28874
Reviewers: george.burgess.iv
Subscribers: llvm-commits, gberry
Differential Revision: https://reviews.llvm.org/D23187
llvm-svn: 277880 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This fixes PR28825. The problem was that we only checked if a value from
a created inner loop is used in the outer loop, and fixed LCSSA for
them. But we missed to fixup LCSSA for values used in exits of the outer
loop.
llvm-svn: 277877 | 
| | 
| 
| 
| | llvm-svn: 277873 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Rewrite domination verifier to handle local domination as well.
This catches a bug Geoff Berry noticed.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23184
llvm-svn: 277872 | 
| | 
| 
| 
| | llvm-svn: 277864 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: We do not care about intrinsic calls when assigning discriminators.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23212
llvm-svn: 277843 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This generated IR based on the order of evaluation, which is different
between GCC and Clang. With that in mind you get bootstrap miscompares
if you compare a Clang built with GCC-built Clang vs. Clang built with
Clang-built Clang. Diagnosing that made my head hurt.
This also reverts commit r277337, which "fixed" the test case.
llvm-svn: 277820 | 
| | 
| 
| 
| | llvm-svn: 277693 | 
| | 
| 
| 
| 
| 
| 
| | This reinstates r277611 + r277614 and reverts r277642.  A cast_or_null
should have been a dyn_cast_or_null.
llvm-svn: 277691 | 
| | 
| 
| 
| 
| 
| 
| | Not a correctness issue, but it would be nice if we didn't have to
recompute our block numbering (worst-case) every time we move MSSA.
llvm-svn: 277652 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This reverts commit r277611 and the followup r277614.
Bootstrap builds and chromium builds are crashing during inlining after
this change.
llvm-svn: 277642 | 
| | 
| 
| 
| 
| 
| 
| | Didn't want to fold this in with r277640, since it touches bits that
aren't entirely related to r277640.
llvm-svn: 277641 | 
| | 
| 
| 
| 
| 
| 
| | This is a follow-up to r277637. It teaches MemorySSA that invariant
loads (and loads of provably constant memory) are always liveOnEntry.
llvm-svn: 277640 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch makes MemorySSA recognize atomic/volatile loads, and makes
MSSA treat said loads specially. This allows us to be a bit more
aggressive in some cases.
Administrative note: Revision was LGTM'ed by reames in person.
Additionally, this doesn't include the `invariant.load` recognition in
the differential revision, because I feel it's better to commit that
separately. Will commit soon.
Differential Revision: https://reviews.llvm.org/D16875
llvm-svn: 277637 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | It is possible for the value map to not have an entry for some value
that has already been removed.
I don't have a testcase, this is fall-out from a buildbot.
llvm-svn: 277614 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | We were able to figure out that the result of a call is some constant.
While propagating that fact, we added the constant to the value map.
This is problematic because it results in us losing the call site when
processing the value map.
This fixes PR28802.
llvm-svn: 277611 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This fixes a bug where we'd sometimes cache overly-conservative results
with our walker. This bug was made more obvious by r277480, which makes
our cache far more spotty than it was. Test case is llvm-unit, because
we're likely going to use CachingWalker only for def optimization in the
future.
The bug stems from that there was a place where the walker assumed that
`DefNode.Last` was a valid target to cache to when failing to optimize
phis. This is sometimes incorrect if we have a cache hit. The fix is to
use the thing we *can* assume is a valid target to cache to. :)
llvm-svn: 277559 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: Depends on D23072
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23076
llvm-svn: 277553 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Reviewers: tejohnson, eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22980
llvm-svn: 277534 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: We really want to move towards MemoryLocOrCall (or fix AA) everywhere, but for now, this lets us have a single instructionClobbersQuery.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23072
llvm-svn: 277530 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | original value.
As agreed in post-commit review of r265388, I'm switching the flag to
its original value until the 90% runtime performance regression on
SingleSource/Benchmarks/Stanford/Bubblesort is addressed.
llvm-svn: 277524 | 
| | 
| 
| 
| | llvm-svn: 277510 | 
| | 
| 
| 
| | llvm-svn: 277502 | 
| | 
| 
| 
| 
| 
| 
| 
| | union for now,
and leave a FIXME.
llvm-svn: 277485 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Fixes PR28670
Summary:
Rewrite the use optimizer to be less memory intensive and 50% faster.
Fixes PR28670
The new use optimizer works like a standard SSA renaming pass, storing
all possible versions a MemorySSA use could get in a stack, and just
tracking indexes into the stack.
This uses much less memory than caching N^2 alias query results.
It's also a lot faster.
The current version defers phi node walking to the normal walker.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23032
llvm-svn: 277480 | 
| | 
| 
| 
| | llvm-svn: 277415 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277411 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Using RAUW was wrong here; if we have a switch transform such as:
  18 -> 6 then
  6 -> 0
If we use RAUW, while performing the second transform the  *transformed* 6
from the first will be also replaced, so we end up with:
  18 -> 0
  6 -> 0
Found by clang stage2 bootstrap; testcase added.
llvm-svn: 277332 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:
    switch (i) {
    case 5: ...
    case 9: ...
    case 13: ...
    case 17: ...
    }
can become:
    if ( (i - 5) % 4 ) goto default;
    switch ((i - 5) / 4) {
    case 0: ...
    case 1: ...
    case 2: ...
    case 3: ...
    }
or even better:
   switch ( ROTR(i - 5, 2) {
   case 0: ...
   case 1: ...
   case 2: ...
   case 3: ...
   }
The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.
llvm-svn: 277325 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | They seem to trigger an LSan failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/15140/steps/check-llvm%20asan/logs/stdio
Revert "Add the tests for r277313"
This reverts commit r277314.
Revert "CodeExtractor : Add ability to preserve profile data."
This reverts commit r277313.
llvm-svn: 277317 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | function.
When extracting a set of blocks make sure to inherit all of the target
dependent attributes to make sure that the function will be valid for
lowering. One example is the "target-features" attribute for x86, if the
extracted region has functionality that relies on a specific feature it
will fail to be lowered.
This also allows for extracted functions to be valid for inlining, at
least back into the parent function, as the target attributes are tested
when inlining for compatibility.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22713
llvm-svn: 277315 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277313 | 
| | 
| 
| 
| 
| 
| | before removing old ones
llvm-svn: 277309 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
is added to the common function analysis passes that loop passes
depend on.
The BFI and indirectly BPI used in this pass is computed lazily so no
overhead should be observed unless -pass-remarks-with-hotness is used.
This is how the patch affects the O3 pipeline:
         Dominator Tree Construction
         Natural Loop Information
         Canonicalize natural loops
         Loop-Closed SSA Form Pass
         Basic Alias Analysis (stateless AA impl)
         Function Alias Analysis Results
         Scalar Evolution Analysis
+        Lazy Branch Probability Analysis
+        Lazy Block Frequency Analysis
+        Optimization Remark Emitter
         Loop Pass Manager
           Rotate Loops
           Loop Invariant Code Motion
           Unswitch loops
         Simplify the CFG
         Dominator Tree Construction
         Basic Alias Analysis (stateless AA impl)
         Function Alias Analysis Results
         Combine redundant instructions
         Natural Loop Information
         Canonicalize natural loops
         Loop-Closed SSA Form Pass
         Scalar Evolution Analysis
+        Lazy Branch Probability Analysis
+        Lazy Block Frequency Analysis
+        Optimization Remark Emitter
         Loop Pass Manager
           Induction Variable Simplification
           Recognize loop idioms
           Delete dead loops
           Unroll loops
...
llvm-svn: 277203 |