| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Add a testcase.
llvm-svn: 122715
|
|
|
|
|
|
| |
so that Dominators.h is *just* domtree. Also prune #includes a bit.
llvm-svn: 122714
|
|
|
|
| |
llvm-svn: 122713
|
|
|
|
|
|
|
|
| |
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy. Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.
llvm-svn: 122712
|
|
|
|
|
|
|
| |
mess with it. We'd rather peel/unroll it than convert all of its
stores into memsets.
llvm-svn: 122711
|
|
|
|
|
|
|
|
|
|
| |
This allows us to compile:
void test(char *s, int a) {
__builtin_memset(s, a, 15);
}
into 1 mul + 3 stores instead of 3 muls + 3 stores.
llvm-svn: 122710
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
series of shifts and ors.
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.
Example code:
void test(char *s, int a) {
__builtin_memset(s, a, 4);
}
before:
_test: ## @test
movzbl 8(%esp), %eax
movl %eax, %ecx
shll $8, %ecx
orl %eax, %ecx
movl %ecx, %eax
shll $16, %eax
orl %ecx, %eax
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
after:
_test: ## @test
movzbl 8(%esp), %eax
imull $16843009, %eax, %eax ## imm = 0x1010101
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
llvm-svn: 122707
|
|
|
|
| |
llvm-svn: 122706
|
|
|
|
|
|
| |
another function.
llvm-svn: 122705
|
|
|
|
|
|
|
| |
blocks in a loop, instead of just the header block. This makes it more
aggressive, able to handle Duncan's Ada examples.
llvm-svn: 122704
|
|
|
|
| |
llvm-svn: 122703
|
|
|
|
|
|
|
|
| |
isExitBlockDominatedByBlockInLoop is a relic of the days when domtree was
*just* a tree and didn't have DFS numbers. Checking DFS numbers is faster
and easier than "limiting the search of the tree".
llvm-svn: 122702
|
|
|
|
| |
llvm-svn: 122701
|
|
|
|
| |
llvm-svn: 122700
|
|
|
|
|
|
|
|
|
|
| |
described
in the PR, the pass could break LCSSA form when inserting preheaders. It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.
llvm-svn: 122695
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
header for now for memset/memcpy opportunities. It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for
loops" into 2 basic block loops that loop-idiom was ignoring.
With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:
for (j=0; j<MAX_history; ++j) {
history_new[i][j+1] = history[2*i][j];
}
Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine. Woo.
llvm-svn: 122685
|
|
|
|
| |
llvm-svn: 122683
|
|
|
|
| |
llvm-svn: 122682
|
|
|
|
|
|
|
| |
size of a loop header instead of its own code size estimator.
This allows it to handle bitcasts etc more precisely.
llvm-svn: 122681
|
|
|
|
| |
llvm-svn: 122678
|
|
|
|
|
|
|
|
|
|
|
| |
maintains the guarantee that the DenseSet expects two elements it contains to
not go from inequal to equal under its nose.
As a side-effect, this also lets us switch from iterating to a fixed-point to
actually maintaining a work queue of functions to look at again, and we don't
add thunks to our work queue so we don't need to detect and ignore them.
llvm-svn: 122677
|
|
|
|
| |
llvm-svn: 122676
|
|
|
|
| |
llvm-svn: 122675
|
|
|
|
|
|
| |
loop idiom pass exposed.
llvm-svn: 122674
|
|
|
|
| |
llvm-svn: 122667
|
|
|
|
|
|
|
|
|
| |
earlyclobber stuff. This should fix PRs 2313 and 8157.
Unfortunately, no testcase, since it'd be dependent on register
assignments.
llvm-svn: 122663
|
|
|
|
|
|
| |
new testcase.
llvm-svn: 122662
|
|
|
|
|
|
| |
is the wrong hammer for this nail, and is probably right.
llvm-svn: 122661
|
|
|
|
|
|
|
| |
aggressively. In practice, this doesn't help anything though,
see the todo.
llvm-svn: 122660
|
|
|
|
|
|
| |
should be correct now.
llvm-svn: 122659
|
|
|
|
| |
llvm-svn: 122658
|
|
|
|
|
|
|
|
|
|
|
| |
numbering, in which it considers (for example) "%a = add i32 %x, %y" and
"%b = add i32 %x, %y" to be equal because the operands are equal and the
result of the instructions only depends on the values of the operands.
This has almost no effect (it removes 4 instructions from gcc-as-one-file),
and perhaps slows down compilation: I measured a 0.4% slowdown on the large
gcc-as-one-file testcase, but it wasn't statistically significant.
llvm-svn: 122654
|
|
|
|
| |
llvm-svn: 122653
|
|
|
|
| |
llvm-svn: 122652
|
|
|
|
|
|
|
| |
is necessary for executing the custom command that runs the
assember. Fixes PR8877.
llvm-svn: 122649
|
|
|
|
|
|
| |
operands are visited before the instructions themselves.
llvm-svn: 122647
|
|
|
|
| |
llvm-svn: 122645
|
|
|
|
| |
llvm-svn: 122642
|
|
|
|
|
|
| |
Fixes PR8861.
llvm-svn: 122641
|
|
|
|
| |
llvm-svn: 122638
|
|
|
|
| |
llvm-svn: 122637
|
|
|
|
| |
llvm-svn: 122636
|
|
|
|
|
|
|
| |
with 2-address instructions, for about a 3.5% speedup of StrongPHIElimination on
403.gcc.
llvm-svn: 122635
|
|
|
|
| |
llvm-svn: 122632
|
|
|
|
| |
llvm-svn: 122631
|
|
|
|
| |
llvm-svn: 122630
|
|
|
|
| |
llvm-svn: 122628
|
|
|
|
|
|
|
| |
process those instructions that define phi sources. This is a 47% speedup of
StrongPHIElimination compile time on 403.gcc.
llvm-svn: 122627
|
|
|
|
| |
llvm-svn: 122626
|
|
|
|
| |
llvm-svn: 122625
|