|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| ... |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded.
This needs a simple probability check because there are some cases where it is
not profitable.
llvm-svn: 291695 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.
Differential revision: https://reviews.llvm.org/D27742
llvm-svn: 291609 | 
| | 
| 
| 
| | llvm-svn: 289766 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
This fixes an issue with MachineBlockPlacement due to a badly timed call
to `analyzeBranch` with `AllowModify` set to true.  The timeline is as
follows:
 1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls
    `TailDup.shouldTailDuplicate` on its argument, which in turn calls
    `analyzeBranch` with `AllowModify` set to true.
 2. This `analyzeBranch` call edits the terminator sequence of the block
    based on the physical layout of the machine function, turning an
    unanalyzable non-fallthrough block to a unanalyzable fallthrough
    block.  Normally MBP bails out of rearranging such blocks, but this
    block was unanalyzable non-fallthrough (and thus rearrangeable) the
    first time MBP looked at it, and so it goes ahead and decides where
    it should be placed in the function.
 3. When placing this block MBP fails to analyze and thus update the
    block in keeping with the new physical layout.
Concretely, before (1) we have something like:
```
LBL0:
  < unknown terminator op that may branch to LBL1 >
  jmp LBL1
LBL1:
  ... A
LBL2:
  ... B
```
In (2), analyze branch simplifies this to
```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump removed
LBL1:
  ... A
LBL2:
  ... B
```
In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2
after the first block since that is profitable.
```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump
LBL2:
  ... B
LBL1:
  ... A
```
and the program now has incorrect behavior (we no longer fall-through
from `LBL0` to `LBL1`) because MBP can no longer edit LBL0.
There are several possible solutions, but I went with removing the teeth
off of the `analyzeBranch` calls in TailDuplicator.  That makes thinking
about the result of these calls easier, and breaks nothing in the lit
test suite.
I've also added some bookkeeping to the MachineBlockPlacement pass and
used that to write an assert that would have caught this.
Reviewers: chandlerc, gberry, MatzeB, iteratee
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D27783
llvm-svn: 289764 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | We fail to produce bit-to-bit matching stage2 and stage3 compiler in PGO
bootstrap build. The reason is because LoopBlockSet is of SmallPtrSet type
whose iterating order depends on the pointer value.
This patch fixes this issue by changing to use SmallSetVector.
Differential Revision: http://reviews.llvm.org/D26634
llvm-svn: 287148 | 
| | 
| 
| 
| 
| 
| | near the other function specific initializations.
llvm-svn: 285758 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Currently PreferredLoopExit is set only in buildLoopChains, which is
never called if there are no MachineLoops.
MSan is currently broken by this:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/145/steps/check-llvm%20msan/logs/stdio
This is a naive fix to get things green again. iteratee: you may have a better fix.
This change will also mean PreferredLoopExit will not carry over if
buildCFGChains() is called a second time in runOnMachineFunction, this
appears to be the right thing.
Reviewers: bkramer, iteratee, echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26069
llvm-svn: 285757 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | There is a use after free bug in the existing code. Loop layout selects
a preferred exit block, and then lays out the loop. If this block is
removed during layout, it needs to be invalidated to prevent a use after
free.
llvm-svn: 285348 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283934 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This reverts commit r283842.
test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
internal testing. I'll share a link with you.
llvm-svn: 283857 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283842 | 
| | 
| 
| 
| 
| 
| | This reverts commit 71c312652c10f1855b28d06697c08d47e7a243e4.
llvm-svn: 283647 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283619 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.
Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.
llvm-svn: 283292 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well.
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283274 | 
| | 
| 
| 
| 
| 
| 
| 
| | This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.
Causing crashes on aarch64 build.
llvm-svn: 283172 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
llvm-svn: 283164 | 
| | 
| 
| 
| | llvm-svn: 281535 | 
| | 
| 
| 
| 
| 
| 
| | analyzeBranch was renamed to use lowercase first, rename
the related set to match.
llvm-svn: 281506 | 
| | 
| 
| 
| 
| 
| 
| 
| | This is prep work for allowing the threshold to be different during layout,
and to enforce a single threshold between merging and duplicating during
layout. No observable change intended.
llvm-svn: 279117 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Do not reorder and move up a loop latch block before a loop header
when optimising for size because this will generate an extra 
unconditional branch.
Differential Revision: https://reviews.llvm.org/D22521
llvm-svn: 278840 | 
| | 
| 
| 
| 
| 
| | No functionality change is intended.
llvm-svn: 278475 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | If the result of the find is only used to compare against end(), just
use is_contained instead.
No functionality change is intended.
llvm-svn: 278433 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The following pattern was being layed out poorly:
              A
             / \
            B   C
           / \ / \
          D   E   ? (Doesn't matter)
Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)
The current algorithm gives:
A,B,C,E (D goes on worklist)
It does this even if C has a frequency count of 0. This patch
adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
then we go ahead and layout E rather than C. Fallthrough half the time
is better than fallthrough never, or fallthrough very rarely. The
resulting layout is:
A,B,E, (C and D are in a worklist)
llvm-svn: 277187 | 
| | 
| 
| 
| 
| 
| | Differential Revision: https://reviews.llvm.org/D22669
llvm-svn: 276849 | 
| | 
| 
| 
| 
| 
| 
| 
| | of the algorithm.
Differential Revision: https://reviews.llvm.org/D22364
llvm-svn: 275595 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.
Reviewers: tstellarAMD, mcrosier
Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai
Differential Revision: https://reviews.llvm.org/D22409
llvm-svn: 275564 | 
| | 
| 
| 
| 
| 
| 
| | Make worklist and ehworklist member of the
class so that they don't need to be passed around.
llvm-svn: 274333 | 
| | 
| 
| 
| | llvm-svn: 274075 | 
| | 
| 
| 
| | llvm-svn: 273744 | 
| | 
| 
| 
| | llvm-svn: 273067 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Document the new parameter and threshod computation
model.  Also fix a bug when the threshold parameter
is set to be different from the default.
 
llvm-svn: 272749 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | profile.
Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20991
llvm-svn: 272729 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Save machine function pointer so that
the reference does not need to be passed around.
This also gives other methods access to machine
function for information such as entry count etc.
 
llvm-svn: 272594 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is third patch to clean up the code.
Included in this patch:
1. Further unclutter trace/chain formation main routine;
2. Isolate the logic to compute global cost/conflict detection
   into its own method;
3. Heavily document the selection algorithm;
4. Added helper hook to allow PGO specific logic to be
   added in the future.
 
llvm-svn: 272582 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is second patch to clean up the code.
In this patch, the logic to determine block outlinining
is refactored and more comments are added.
 
llvm-svn: 272514 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is one of the patches to clean up the code so that
it is in a better form to make future enhancements easier.
In htis patch, the logic to collect viable successors are
extrated as a helper to unclutter the caller which gets very
large recenty. Also cleaned up BP adjustment code.
 
llvm-svn: 272482 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This reapplies commit r271930, r271915, r271923.  They hit a bug in
Thumb which is fixed in r272258 now.
The original message:
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.
This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.
llvm-svn: 272267 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | backward-hot-prob consistently.
Summary:
Consider the following diamond CFG:
 A
/ \
B C
 \/
 D
Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20989
llvm-svn: 272203 | 
| | 
| 
| 
| 
| 
| 
| | This reverts commit r271930, r271915, r271923.  They break a thumb selfhosting
bot.
llvm-svn: 272017 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.
This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.
Differential Revision: http://reviews.llvm.org/D20276
llvm-svn: 271925 | 
| | 
| 
| 
| | llvm-svn: 271751 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | branches. NFCI.
The benefits of this patch are
-- We call AnalyzeBranch() to optimize unanalyzable branches, but the result of
   AnalyzeBranch() is not used. Now the result is useful.
-- Before the layout of all the MBBs is set, the result of AnalyzeBranch() is
   not correct and needs to be fixed before using it to optimize the branch
   conditions. Now this optimization is called after the layout, the code used
   to fix the result of AnalyzeBranch() is not needed.
-- The branch condition of the last block is not optimized before. Now it is
   optimized.
Differential Revision: http://reviews.llvm.org/D20177
llvm-svn: 270623 | 
| | 
| 
| 
| 
| 
| 
| 
| | skipFunction() is called twice.
Differential Revision: http://reviews.llvm.org/D20377
llvm-svn: 269994 | 
| | 
| 
| 
| | llvm-svn: 269307 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Currently cost based loop rotation algo can only be turned on with
two conditions: the function has real profile data, and -precise-rotation-cost
flag is turned on. This is not convenient for developers to experiment
when profile is not available. Add a new option to force the new
rotation algorithm -force-precise-rotation-cost
llvm-svn: 269266 | 
| | 
| 
| 
| 
| 
| | Differential Revision: http://reviews.llvm.org/D19882
llvm-svn: 268457 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | After the layout of the basic blocks is set, the target may be able to get rid
of unconditional branches to fallthrough blocks that the generic code does not
catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to
analyze all the branches involved in the terminators sequence, while still
understanding a few of them.
In such situation, AnalyzeBranch can directly modify the branches if it has been
instructed to do so.
This patch takes advantage of that.
llvm-svn: 268328 | 
| | 
| 
| 
| 
| 
| | Fix a FIXME.  Disable loop alignment if compiled with -Oz now.
llvm-svn: 268121 | 
| | 
| 
| 
| 
| 
| | Cut and Paste.
llvm-svn: 268067 |