| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 11179
|
| |
|
|
|
|
|
|
|
| |
removeDeadNodes is called, only call it at the end of the pass being run.
This saves 1.3 seconds running DSA on 177.mesa (5.3->4.0s), which is
pretty big. This is only possible because of the automatic garbage
collection done on forwarding nodes.
llvm-svn: 11178
|
| |
|
|
| |
llvm-svn: 11177
|
| |
|
|
| |
llvm-svn: 11176
|
| |
|
|
|
|
|
|
|
| |
DSGraphs while they are forwarding. When the last reference to the forwarding
node is dropped, the forwarding node is autodeleted. This should simplify
removeTriviallyDead nodes, and is only (efficiently) possible because we are
using an ilist of dsnodes now.
llvm-svn: 11175
|
| |
|
|
|
|
| |
G == 0
llvm-svn: 11174
|
| |
|
|
| |
llvm-svn: 11173
|
| |
|
|
| |
llvm-svn: 11171
|
| |
|
|
|
|
|
| |
Rename stats from dsnode -> dsa
Add a new stat
llvm-svn: 11167
|
| |
|
|
| |
llvm-svn: 11166
|
| |
|
|
| |
llvm-svn: 11157
|
| |
|
|
| |
llvm-svn: 11151
|
| |
|
|
|
|
| |
of the virtual register to certain functions.
llvm-svn: 11143
|
| |
|
|
|
|
| |
keeps finding more code motion opportunities now that the dominators are correct!
llvm-svn: 11142
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
slots each. As a concequence they get numbered as 0, 2, 4 and so
on. The first slot is used for operand uses and the second for
defs. Here's an example:
0: A = ...
2: B = ...
4: C = A + B ;; last use of A
The live intervals should look like:
A = [1, 5)
B = [3, x)
C = [5, y)
llvm-svn: 11141
|
| |
|
|
| |
llvm-svn: 11140
|
| |
|
|
| |
llvm-svn: 11139
|
| |
|
|
|
|
|
|
| |
The problem is that the dominator update code didn't "realize" that it's
possible for the newly inserted basic block to dominate anything. Because
it IS possible, stuff was getting updated wrong.
llvm-svn: 11137
|
| |
|
|
|
|
| |
access. Rather we only have to do it on the creation of the interval.
llvm-svn: 11135
|
| |
|
|
|
|
|
|
|
|
| |
complete rewrite of load-vn will make it a bit faster. This changes speeds up
the gcse pass (which uses load-vn) from 25.45s to 0.42s on the testcase in
PR209.
I've also verified that this gives the exact same results as the old one.
llvm-svn: 11132
|
| |
|
|
|
|
|
| |
which causes big reindentation. While I'm at it, I fix the fixme by removing
some dead code.
llvm-svn: 11131
|
| |
|
|
| |
llvm-svn: 11130
|
| |
|
|
| |
llvm-svn: 11129
|
| |
|
|
| |
llvm-svn: 11128
|
| |
|
|
| |
llvm-svn: 11126
|
| |
|
|
|
|
| |
operand of the instruction and thus simplify the register allocation.
llvm-svn: 11124
|
| |
|
|
| |
llvm-svn: 11123
|
| |
|
|
|
|
| |
at Chris's request.
llvm-svn: 11120
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
1. Don't scan to the end of alloca instructions in the caller function to
insert inlined allocas, just insert at the top. This saves a lot of
time inlining into functions with a lot of allocas.
2. Use splice to move the alloca instructions over, instead of remove/insert.
This allows us to transfer a block at a time, and eliminates a bunch of
silly symbol table manipulations.
This speeds up the inliner on the testcase in PR209 from 1.73s -> 1.04s (67%)
llvm-svn: 11118
|
| |
|
|
|
|
| |
to be the same (IOW they are not two address instructions).
llvm-svn: 11117
|
| |
|
|
|
|
|
|
|
|
|
| |
basic block,
and that basic block ends with a return instruction. In this case, we can just splice
the cloned "body" of the function directly into the source basic block, avoiding a lot
of rearrangement and splitBasicBlock's linear scan over the split block. This speeds up
the inliner on the testcase in PR209 from 2.3s to 1.7s, a 35% reduction.
llvm-svn: 11116
|
| |
|
|
| |
llvm-svn: 11114
|
| |
|
|
| |
llvm-svn: 11113
|
| |
|
|
| |
llvm-svn: 11111
|
| |
|
|
|
|
| |
half.
llvm-svn: 11110
|
| |
|
|
|
|
|
| |
before we delete the original call site, allowing slight simplifications of
code, but nothing exciting.
llvm-svn: 11109
|
| |
|
|
|
|
|
|
|
|
| |
process. The only optimization we did so far is to avoid creating a
PHI node, then immediately destroying it in the common case where the
callee has one return statement. Instead, we just don't create the return
value. This has no noticable performance impact, but paves the way for
future improvements.
llvm-svn: 11108
|
| |
|
|
|
|
|
|
| |
to add the cloned block to. This allows the block to be added to the function
immediately, and all of the instructions to be immediately added to the function
symbol table, which speeds up the inliner from 3.7 -> 3.38s on the PR209.
llvm-svn: 11107
|
| |
|
|
| |
llvm-svn: 11106
|
| |
|
|
|
|
|
|
|
|
|
| |
instructions,
instead of a loop that is really inefficient with large basic blocks.
This speeds up the inliner pass on the testcase in PR209 from 13.8s to 2.24s
which still isn't exactly speedy, but is a lot better. :)
llvm-svn: 11105
|
| |
|
|
|
|
|
| |
process them all as a group. This speeds up SRoA/mem2reg from 28.46s to
0.62s on the testcase from PR209.
llvm-svn: 11100
|
| |
|
|
|
|
| |
SRoA/mem2reg from 41.2s to 27.5s on the testcase in PR209.
llvm-svn: 11099
|
| |
|
|
| |
llvm-svn: 11098
|
| |
|
|
| |
llvm-svn: 11095
|
| |
|
|
| |
llvm-svn: 11094
|
| |
|
|
|
|
| |
spilled, A was loaded from its stack location twice. This fixes the bug.
llvm-svn: 11093
|
| |
|
|
| |
llvm-svn: 11091
|
| |
|
|
| |
llvm-svn: 11088
|
| |
|
|
| |
llvm-svn: 11087
|
| |
|
|
|
|
| |
outside of loops = 0.
llvm-svn: 11085
|