| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) have it fold "br undef", which does occur with
surprising frequency as jump threading iterates.
2) teach j-t to delete dead blocks. This removes the successor
edges, reducing the in-edges of other blocks, allowing
recursive simplification.
3) Fold things like:
br COND, BBX, BBY
BBX:
br COND, BBZ, BBW
which also happens because jump threading iterates.
llvm-svn: 60470
|
|
|
|
| |
llvm-svn: 60469
|
|
|
|
| |
llvm-svn: 60468
|
|
|
|
|
|
|
| |
unconditionally delete the block. All likely clients will
do the checking anyway.
llvm-svn: 60464
|
|
|
|
|
|
| |
DeleteBlockIfDead method.
llvm-svn: 60463
|
|
|
|
| |
llvm-svn: 60442
|
|
|
|
| |
llvm-svn: 60431
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
straight-forward implementation. This does not require any extra
alias analysis queries beyond what we already do for non-local loads.
Some programs really really like load PRE. For example, SPASS triggers
this ~1000 times, ~300 times in 255.vortex, and ~1500 times on 403.gcc.
The biggest limitation to the implementation is that it does not split
critical edges. This is a huge killer on many programs and should be
addressed after the initial patch is enabled by default.
The implementation of this should incidentally speed up rejection of
non-local loads because it avoids creating the repl densemap in cases
when it won't be used for fully redundant loads.
This is currently disabled by default.
Before I turn this on, I need to fix a couple of miscompilations in
the testsuite, look at compile time performance numbers, and look at
perf impact. This is pretty close to ready though.
llvm-svn: 60408
|
|
|
|
| |
llvm-svn: 60403
|
|
|
|
| |
llvm-svn: 60402
|
|
|
|
| |
llvm-svn: 60401
|
|
|
|
|
|
|
|
|
| |
constant. If X is a constant, then this is folded elsewhere.
- Added a note to Target/README.txt to indicate that we'd like to implement
this when we're able.
llvm-svn: 60399
|
|
|
|
| |
llvm-svn: 60398
|
|
|
|
|
|
|
|
| |
- No need to do a swap on a canonicalized pattern.
No functionality change.
llvm-svn: 60397
|
|
|
|
| |
llvm-svn: 60395
|
|
|
|
|
|
|
| |
a new value numbering set after splitting a critical edge. This increases
the number of instances of PRE on 403.gcc from ~60 to ~570.
llvm-svn: 60393
|
|
|
|
|
|
|
|
|
| |
figuring out the base of the IV. This produces better
code in the example. (Addresses use (IV) instead of
(BASE,IV) - a significant improvement on low-register
machines like x86).
llvm-svn: 60374
|
|
|
|
| |
llvm-svn: 60370
|
|
|
|
| |
llvm-svn: 60369
|
|
|
|
|
|
| |
integer is "minint".
llvm-svn: 60366
|
|
|
|
|
|
| |
don't have overlapping bits.
llvm-svn: 60344
|
|
|
|
| |
llvm-svn: 60343
|
|
|
|
| |
llvm-svn: 60341
|
|
|
|
|
|
| |
fiddling with constants unless we have to.
llvm-svn: 60340
|
|
|
|
|
|
| |
instead of throughout it.
llvm-svn: 60339
|
|
|
|
|
|
|
| |
that it isn't reallocated all the time. This is a tiny speedup for
GVN: 3.90->3.88s
llvm-svn: 60338
|
|
|
|
| |
llvm-svn: 60337
|
|
|
|
|
|
|
|
|
| |
instead of std::sort. This shrinks the release-asserts LSR.o file
by 1100 bytes of code on my system.
We should start using array_pod_sort where possible.
llvm-svn: 60335
|
|
|
|
|
|
| |
This is a lot cheaper and conceptually simpler.
llvm-svn: 60332
|
|
|
|
|
|
| |
DeadInsts ivar, just use it directly.
llvm-svn: 60330
|
|
|
|
|
|
|
|
| |
buggy rewrite, this notifies ScalarEvolution of a pending instruction
about to be removed and then erases it, instead of erasing it then
notifying.
llvm-svn: 60329
|
|
|
|
|
|
| |
xor in testcase (or is a substring).
llvm-svn: 60328
|
|
|
|
|
|
|
|
|
| |
new instructions it simplifies. Because we're threading jumps on edges
with constants coming in from PHI's, we inherently are exposing a lot more
constants to the new block. Folding them and deleting dead conditions
allows the cost model in jump threading to be more accurate as it iterates.
llvm-svn: 60327
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of using FoldPHIArgBinOpIntoPHI. In addition to being more
obvious, this also fixes a problem where instcombine wouldn't merge two
phis that had different variable indices. This prevented instcombine
from factoring big chunks of code in 403.gcc. For example:
insn_cuid.exit:
- %tmp336 = load i32** @uid_cuid, align 4
- %tmp337 = getelementptr %struct.rtx_def* %insn_addr.0.ph.i, i32 0, i32 3
- %tmp338 = bitcast [1 x %struct.rtunion]* %tmp337 to i32*
- %tmp339 = load i32* %tmp338, align 4
- %tmp340 = getelementptr i32* %tmp336, i32 %tmp339
br label %bb62
bb61:
- %tmp341 = load i32** @uid_cuid, align 4
- %tmp342 = getelementptr %struct.rtx_def* %insn, i32 0, i32 3
- %tmp343 = bitcast [1 x %struct.rtunion]* %tmp342 to i32*
- %tmp344 = load i32* %tmp343, align 4
- %tmp345 = getelementptr i32* %tmp341, i32 %tmp344
br label %bb62
bb62:
- %iftmp.62.0.in = phi i32* [ %tmp345, %bb61 ], [ %tmp340, %insn_cuid.exit ]
+ %insn.pn2 = phi %struct.rtx_def* [ %insn, %bb61 ], [ %insn_addr.0.ph.i, %insn_cuid.exit ]
+ %tmp344.pn.in.in = getelementptr %struct.rtx_def* %insn.pn2, i32 0, i32 3
+ %tmp344.pn.in = bitcast [1 x %struct.rtunion]* %tmp344.pn.in.in to i32*
+ %tmp341.pn = load i32** @uid_cuid
+ %tmp344.pn = load i32* %tmp344.pn.in
+ %iftmp.62.0.in = getelementptr i32* %tmp341.pn, i32 %tmp344.pn
%iftmp.62.0 = load i32* %iftmp.62.0.in
llvm-svn: 60325
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
important because it is sinking the loads using the GEPs, but
not the GEPs themselves. This triggers 647 times on 403.gcc
and makes the .s file much much nicer. For example before:
je LBB1_87 ## bb78
LBB1_62: ## bb77
leal 84(%esi), %eax
LBB1_63: ## bb79
movl (%eax), %eax
...
LBB1_87: ## bb78
movl $0, 4(%esp)
movl %esi, (%esp)
call L_make_decl_rtl$stub
jmp LBB1_62 ## bb77
after:
jne LBB1_63 ## bb79
LBB1_62: ## bb78
movl $0, 4(%esp)
movl %esi, (%esp)
call L_make_decl_rtl$stub
LBB1_63: ## bb79
movl 84(%esi), %eax
The input code was (and the GEPs are merged and
the PHI is now eliminated by instcombine):
br i1 %tmp233, label %bb78, label %bb77
bb77:
%tmp234 = getelementptr %struct.tree_node* %t_addr.3, i32 0, i32 0, i32 22
br label %bb79
bb78:
call void @make_decl_rtl(%struct.tree_node* %t_addr.3, i8* null) nounwind
%tmp235 = getelementptr %struct.tree_node* %t_addr.3, i32 0, i32 0, i32 22
br label %bb79
bb79:
%iftmp.12.0.in = phi %struct.rtx_def** [ %tmp235, %bb78 ], [ %tmp234, %bb77 ]
%iftmp.12.0 = load %struct.rtx_def** %iftmp.12.0.in
llvm-svn: 60322
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
elimination: when finding dependent load/stores, realize that
they are the same if aliasing claims must alias instead of relying
on the pointers to be exactly equal. This makes load elimination
more aggressive. For example, on 403.gcc, we had:
< 68 gvn - Number of instructions PRE'd
< 152718 gvn - Number of instructions deleted
< 49699 gvn - Number of loads deleted
< 6153 memdep - Number of dirty cached non-local responses
< 169336 memdep - Number of fully cached non-local responses
< 162428 memdep - Number of uncached non-local responses
now we have:
> 64 gvn - Number of instructions PRE'd
> 153623 gvn - Number of instructions deleted
> 49856 gvn - Number of loads deleted
> 5022 memdep - Number of dirty cached non-local responses
> 159030 memdep - Number of fully cached non-local responses
> 162443 memdep - Number of uncached non-local responses
That's an extra 157 loads deleted and extra 905 other instructions nuked.
This slows down GVN very slightly, from 3.91 to 3.96s.
llvm-svn: 60314
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vector instead of a densemap. This shrinks the memory usage of this thing
substantially (the high water mark) as well as making operations like
scanning it faster. This speeds up memdep slightly, gvn goes from
3.9376 to 3.9118s on 403.gcc
This also splits out the statistics for the cached non-local case to
differentiate between the dirty and clean cached case. Here's the stats
for 403.gcc:
6153 memdep - Number of dirty cached non-local responses
169336 memdep - Number of fully cached non-local responses
162428 memdep - Number of uncached non-local responses
yay for caching :)
llvm-svn: 60313
|
|
|
|
|
|
| |
permutations of this pattern.
llvm-svn: 60312
|
|
|
|
|
|
| |
This speeds up GVN from 4.0386s to 3.9376s.
llvm-svn: 60310
|
|
|
|
|
|
|
| |
remove some fixme's. This speeds up GVN very slightly on 403.gcc
(4.06->4.03s)
llvm-svn: 60309
|
|
|
|
|
|
| |
functional change.
llvm-svn: 60307
|
|
|
|
|
|
|
| |
Note that the FoldOpIntoPhi call is dead because it's impossible for the
first operand of a subtraction to be both a ConstantInt and a PHINode.
llvm-svn: 60306
|
|
|
|
| |
llvm-svn: 60291
|
|
|
|
|
|
| |
takes care of all permutations of this pattern.
llvm-svn: 60290
|
|
|
|
| |
llvm-svn: 60289
|
|
|
|
|
|
|
|
| |
APInt calls instead.
This fixes PR3144.
llvm-svn: 60288
|
|
|
|
|
|
| |
only show up in code from front-ends besides llvm-gcc, like clang.
llvm-svn: 60287
|
|
|
|
| |
llvm-svn: 60279
|
|
|
|
|
|
|
|
|
|
| |
"For signed integers, the determination of overflow of x*y is not so simple. If
x and y have the same sign, then overflow occurs iff xy > 2**31 - 1. If they
have opposite signs, then overflow occurs iff xy < -2**31."
In this case, x == -1.
llvm-svn: 60278
|
|
|
|
|
|
|
|
| |
overflowed on negation. This commit checks to make sure that neithe C nor X
overflows. This requires that the RHS of X (a subtract instruction) be a
constant integer.
llvm-svn: 60275
|