| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 28289
|
| |
|
|
| |
llvm-svn: 28286
|
| |
|
|
| |
llvm-svn: 28284
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bitfield now gives this code:
_plus:
lwz r2, 0(r3)
rlwimi r2, r2, 0, 1, 31
xoris r2, r2, 32768
stw r2, 0(r3)
blr
instead of this:
_plus:
lwz r2, 0(r3)
srwi r4, r2, 31
slwi r4, r4, 31
addis r4, r4, -32768
rlwimi r2, r4, 0, 0, 0
stw r2, 0(r3)
blr
this can obviously still be improved.
llvm-svn: 28275
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
currently very limited, but can be extended in the future. For example,
we now compile:
uint %test30(uint %c1) {
%c2 = cast uint %c1 to ubyte
%c3 = xor ubyte %c2, 1
%c4 = cast ubyte %c3 to uint
ret uint %c4
}
to:
_xor:
movzbl 4(%esp), %eax
xorl $1, %eax
ret
instead of:
_xor:
movb $1, %al
xorb 4(%esp), %al
movzbl %al, %eax
ret
More impressively, we now compile:
struct B { unsigned bit : 1; };
void xor(struct B *b) { b->bit = b->bit ^ 1; }
To (X86/PPC):
_xor:
movl 4(%esp), %eax
xorl $-2147483648, (%eax)
ret
_xor:
lwz r2, 0(r3)
xoris r2, r2, 32768
stw r2, 0(r3)
blr
instead of (X86/PPC):
_xor:
movl 4(%esp), %eax
movl (%eax), %ecx
movl %ecx, %edx
shrl $31, %edx
# TRUNCATE movb %dl, %dl
xorb $1, %dl
movzbl %dl, %edx
andl $2147483647, %ecx
shll $31, %edx
orl %ecx, %edx
movl %edx, (%eax)
ret
_xor:
lwz r2, 0(r3)
srwi r4, r2, 31
xori r4, r4, 1
rlwimi r2, r4, 31, 0, 0
stw r2, 0(r3)
blr
This implements InstCombine/cast.ll:test30.
llvm-svn: 28273
|
| |
|
|
|
|
| |
Fix a nasty bug in the memcmp optimizer where we used the wrong variable!
llvm-svn: 28269
|
| |
|
|
| |
llvm-svn: 28268
|
| |
|
|
|
|
|
|
|
|
|
| |
When doing the initial pass of constant folding, if we get a constantexpr,
simplify the constant expr like we would do if the constant is folded in the
normal loop.
This fixes the missed-optimization regression in
Transforms/InstCombine/getelementptr.ll last night.
llvm-svn: 28224
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Implement InstCombine/deadcode.ll by not adding instructions in unreachable
blocks (due to constants in conditional branches/switches) to the worklist.
This causes them to be deleted before instcombine starts up, leading to
better optimization.
2. In the prepass over instructions, do trivial constprop/dce as we go. This
has the effect of improving the effectiveness of #1. In addition, it
*significantly* speeds up instcombine on test cases with large amounts of
constant folding code (for example, that produced by code specialization
or partial evaluation). In one example, it speeds up instcombine from
0.0589s to 0.0224s with a release build (a 2.6x speedup).
llvm-svn: 28215
|
| |
|
|
|
|
| |
Domagoj Babic!
llvm-svn: 28181
|
| |
|
|
|
|
|
|
|
|
| |
Make the "fold (and (cast A), (cast B)) -> (cast (and A, B))" transformation
only apply when both casts really will cause code to be generated. If one or
both doesn't, then this xform doesn't remove a cast.
This fixes Transforms/InstCombine/2006-05-06-Infloop.ll
llvm-svn: 28141
|
| |
|
|
| |
llvm-svn: 28128
|
| |
|
|
| |
llvm-svn: 28126
|
| |
|
|
| |
llvm-svn: 28101
|
| |
|
|
|
|
| |
Domagoj Babic!
llvm-svn: 28048
|
| |
|
|
| |
llvm-svn: 28019
|
| |
|
|
| |
llvm-svn: 28007
|
| |
|
|
|
|
| |
Transforms/InstCombine/vec_insert_to_shuffle.ll
llvm-svn: 27997
|
| |
|
|
|
|
|
|
|
|
| |
nondeterminism being bad) could cause some trivial missed optimizations (dead
phi nodes being left around for later passes to clean up).
With this, llvm-gcc4 now bootstraps and correctly compares. I don't know
why I never tried to do it before... :)
llvm-svn: 27984
|
| |
|
|
| |
llvm-svn: 27912
|
| |
|
|
| |
llvm-svn: 27881
|
| |
|
|
|
|
| |
can be converted to losslessly, we can continue the conversion to a direct call.
llvm-svn: 27880
|
| |
|
|
|
|
| |
if the pointer is known aligned.
llvm-svn: 27781
|
| |
|
|
|
|
|
|
| |
Make the insert/extract elt -> shuffle code more aggressive.
This fixes CodeGen/PowerPC/vec_shuffle.ll
llvm-svn: 27728
|
| |
|
|
| |
llvm-svn: 27727
|
| |
|
|
|
|
| |
maximal shuffles out of them where possible.
llvm-svn: 27717
|
| |
|
|
|
|
|
| |
insert/extractelement operations. This implements
Transforms/ScalarRepl/vector_promote.ll
llvm-svn: 27710
|
| |
|
|
| |
llvm-svn: 27652
|
| |
|
|
| |
llvm-svn: 27625
|
| |
|
|
|
|
|
|
| |
aggressive in some cases where LLVMGCC 4 is inserting casts for no reason.
This implements InstCombine/cast.ll:test27/28.
llvm-svn: 27620
|
| |
|
|
| |
llvm-svn: 27573
|
| |
|
|
| |
llvm-svn: 27571
|
| |
|
|
|
|
|
|
|
|
| |
are visible to analysis as intrinsics. That is, make sure someone doesn't pass
free around by address in some struct (as happens in say 176.gcc).
This doesn't get rid of any indirect calls, just ensure calls to free and malloc
are always direct.
llvm-svn: 27560
|
| |
|
|
| |
llvm-svn: 27513
|
| |
|
|
| |
llvm-svn: 27478
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
us to compile oh-so-realistic stuff like this:
vec_vperm(A, B, (vector unsigned char){14});
to:
vspltb v0, v0, 14
instead of:
vspltisb v0, 14
vperm v0, v2, v1, v0
llvm-svn: 27452
|
| |
|
|
|
|
|
|
|
|
|
| |
%tmp = cast <4 x uint> %tmp to <4 x int> ; <<4 x int>> [#uses=1]
%tmp = cast <4 x int> %tmp to <4 x float> ; <<4 x float>> [#uses=1]
into:
%tmp = cast <4 x uint> %tmp to <4 x float> ; <<4 x float>> [#uses=1]
llvm-svn: 27355
|
| |
|
|
|
|
|
|
|
|
|
|
| |
%tmp = cast <4 x uint>* %testData to <4 x int>* ; <<4 x int>*> [#uses=1]
%tmp = load <4 x int>* %tmp ; <<4 x int>> [#uses=1]
to this:
%tmp = load <4 x uint>* %testData ; <<4 x uint>> [#uses=1]
%tmp = cast <4 x uint> %tmp to <4 x int> ; <<4 x int>> [#uses=1]
llvm-svn: 27353
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
elimination of one load from this:
int AreSecondAndThirdElementsBothNegative( vector float *in ) {
#define QNaN 0x7FC00000
const vector unsigned int testData = (vector unsigned int)( QNaN, 0, 0, QNaN );
vector float test = vec_ld( 0, (float*) &testData );
return ! vec_any_ge( test, *in );
}
Now generating:
_AreSecondAndThirdElementsBothNegative:
mfspr r2, 256
oris r4, r2, 49152
mtspr 256, r4
li r4, lo16(LCPI1_0)
lis r5, ha16(LCPI1_0)
addi r6, r1, -16
lvx v0, r5, r4
stvx v0, 0, r6
lvx v1, 0, r3
vcmpgefp. v0, v0, v1
mfcr r3, 2
rlwinm r3, r3, 27, 31, 31
xori r3, r3, 1
cntlzw r3, r3
srwi r3, r3, 5
mtspr 256, r2
blr
llvm-svn: 27352
|
| |
|
|
| |
llvm-svn: 27344
|
| |
|
|
| |
llvm-svn: 27332
|
| |
|
|
| |
llvm-svn: 27330
|
| |
|
|
|
|
|
|
| |
Fold (B&A)^A == ~B & A
This implements InstCombine/xor.ll:test2[56]
llvm-svn: 27328
|
| |
|
|
|
|
| |
extract_element'd value, do so.
llvm-svn: 27323
|
| |
|
|
| |
llvm-svn: 27300
|
| |
|
|
| |
llvm-svn: 27261
|
| |
|
|
| |
llvm-svn: 27161
|
| |
|
|
| |
llvm-svn: 27125
|
| |
|
|
| |
llvm-svn: 27052
|
| |
|
|
| |
llvm-svn: 27051
|