| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
properties.
Added the self-wrap flag for SCEV::AddRecExpr.
A slew of temporary FIXMEs indicate the intention of the no-self-wrap flag
without changing behavior in this revision.
llvm-svn: 127590
|
|
|
|
| |
llvm-svn: 127589
|
|
|
|
|
|
| |
Radar 9097659
llvm-svn: 127182
|
|
|
|
| |
llvm-svn: 126125
|
|
|
|
| |
llvm-svn: 126102
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
constant, including globals. This makes us generate much more "pretty" pattern
globals as well because it doesn't break it down to an array of bytes all the
time.
This enables us to handle stores of relocatable globals. This kicks in about
48 times in 254.gap, giving us stuff like this:
@.memset_pattern40 = internal constant [2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*] [%struct.TypHeader* (%struct.TypHeader*, %struct
.TypHeader*)* @IsFalse, %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)* @IsFalse], align 16
...
call void @memset_pattern16(i8* %scevgep5859, i8* bitcast ([2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*]* @.memset_pattern40 to i8*
), i64 %tmp75) nounwind
llvm-svn: 126044
|
|
|
|
|
|
|
|
|
|
|
| |
unsplatable values into memset_pattern16 when it is available
(recent darwins). This transforms lots of strided loop stores
of ints for example, like 5 in vpr:
Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4
llvm-svn: 126040
|
|
|
|
|
|
| |
to hack on memset, memcpy etc.
llvm-svn: 125974
|
|
|
|
| |
llvm-svn: 125563
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when safe.
The testcase is basically this nested loop:
void foo(char *X) {
for (int i = 0; i != 100; ++i)
for (int j = 0; j != 100; ++j)
X[j+i*100] = 0;
}
which gets turned into a single memset now. clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.
llvm-svn: 122806
|
|
|
|
|
|
|
|
| |
instruction *after* the store. The store will always be deleted
if the transformation kicks in, so we'd do an N^2 scan of every
loop block. Whoops.
llvm-svn: 122805
|
|
|
|
|
|
|
| |
stop setting NSW: signed overflow is possible. Thanks to Dan
for pointing these out.
llvm-svn: 122790
|
|
|
|
| |
llvm-svn: 122788
|
|
|
|
| |
llvm-svn: 122720
|
|
|
|
|
|
| |
Teach it to CSE the rest of the non-side-effecting instructions.
llvm-svn: 122716
|
|
|
|
|
|
|
|
| |
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy. Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.
llvm-svn: 122712
|
|
|
|
|
|
|
| |
mess with it. We'd rather peel/unroll it than convert all of its
stores into memsets.
llvm-svn: 122711
|
|
|
|
|
|
|
| |
blocks in a loop, instead of just the header block. This makes it more
aggressive, able to handle Duncan's Ada examples.
llvm-svn: 122704
|
|
|
|
| |
llvm-svn: 122701
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
header for now for memset/memcpy opportunities. It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for
loops" into 2 basic block loops that loop-idiom was ignoring.
With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:
for (j=0; j<MAX_history; ++j) {
history_new[i][j+1] = history[2*i][j];
}
Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine. Woo.
llvm-svn: 122685
|
|
|
|
| |
llvm-svn: 122683
|
|
|
|
| |
llvm-svn: 122682
|
|
|
|
| |
llvm-svn: 122678
|
|
|
|
|
|
| |
new testcase.
llvm-svn: 122662
|
|
|
|
|
|
|
| |
aggressively. In practice, this doesn't help anything though,
see the todo.
llvm-svn: 122660
|
|
|
|
|
|
| |
should be correct now.
llvm-svn: 122659
|
|
|
|
|
|
|
| |
check for "multiple of a byte" in size to make it clear that the
>> 3 below is safe.
llvm-svn: 122604
|
|
|
|
| |
llvm-svn: 122593
|
|
|
|
| |
llvm-svn: 122585
|
|
|
|
| |
llvm-svn: 122574
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
memsets. This is still missing one important validity check, but this is enough
to compile stuff like this:
void test0(std::vector<char> &X) {
for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I)
*I = 0;
}
void test1(std::vector<int> &X) {
for (long i = 0, e = X.size(); i != e; ++i)
X[i] = 0x01010101;
}
With:
$ clang t.cpp -S -o - -O2 -emit-llvm | opt -loop-idiom | opt -O3 | llc
to:
__Z5test0RSt6vectorIcSaIcEE: ## @_Z5test0RSt6vectorIcSaIcEE
## BB#0: ## %entry
subq $8, %rsp
movq (%rdi), %rax
movq 8(%rdi), %rsi
cmpq %rsi, %rax
je LBB0_2
## BB#1: ## %bb.nph
subq %rax, %rsi
movq %rax, %rdi
callq ___bzero
LBB0_2: ## %for.end
addq $8, %rsp
ret
...
__Z5test1RSt6vectorIiSaIiEE: ## @_Z5test1RSt6vectorIiSaIiEE
## BB#0: ## %entry
subq $8, %rsp
movq (%rdi), %rax
movq 8(%rdi), %rdx
subq %rax, %rdx
cmpq $4, %rdx
jb LBB1_2
## BB#1: ## %for.body.preheader
andq $-4, %rdx
movl $1, %esi
movq %rax, %rdi
callq _memset
LBB1_2: ## %for.end
addq $8, %rsp
ret
llvm-svn: 122573
|
|
|
|
| |
llvm-svn: 122567
|
|
llvm-svn: 122563
|