| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
Remove the Function pointer cast in these calls, converting it to
a cast of argument.
%tmp60 = tail call int cast (int (ulong)* %str to int (int)*)( int 10 )
%tmp60 = tail call int cast (int (ulong)* %str to int (int)*)( uint %tmp51 )
llvm-svn: 28953
|
|
|
|
|
|
| |
idioms into bswap intrinsics.
llvm-svn: 28803
|
|
|
|
|
|
| |
PPC/altivec
llvm-svn: 28698
|
|
|
|
|
|
|
| |
but for sub, it really does! Fix fixes a miscompilation of fibheap_cut in
llvmgcc4.
llvm-svn: 28600
|
|
|
|
| |
llvm-svn: 28503
|
|
|
|
| |
llvm-svn: 28490
|
|
|
|
|
|
| |
No functionality change.
llvm-svn: 28489
|
|
|
|
|
|
|
| |
the program. This exposes more opportunities for the instcombiner, and implements
vec_shuffle.ll:test6
llvm-svn: 28487
|
|
|
|
|
|
| |
extractelement from the SV's source. This implement vec_shuffle.ll:test[45]
llvm-svn: 28485
|
|
|
|
| |
llvm-svn: 28422
|
|
|
|
| |
llvm-svn: 28284
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bitfield now gives this code:
_plus:
lwz r2, 0(r3)
rlwimi r2, r2, 0, 1, 31
xoris r2, r2, 32768
stw r2, 0(r3)
blr
instead of this:
_plus:
lwz r2, 0(r3)
srwi r4, r2, 31
slwi r4, r4, 31
addis r4, r4, -32768
rlwimi r2, r4, 0, 0, 0
stw r2, 0(r3)
blr
this can obviously still be improved.
llvm-svn: 28275
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
currently very limited, but can be extended in the future. For example,
we now compile:
uint %test30(uint %c1) {
%c2 = cast uint %c1 to ubyte
%c3 = xor ubyte %c2, 1
%c4 = cast ubyte %c3 to uint
ret uint %c4
}
to:
_xor:
movzbl 4(%esp), %eax
xorl $1, %eax
ret
instead of:
_xor:
movb $1, %al
xorb 4(%esp), %al
movzbl %al, %eax
ret
More impressively, we now compile:
struct B { unsigned bit : 1; };
void xor(struct B *b) { b->bit = b->bit ^ 1; }
To (X86/PPC):
_xor:
movl 4(%esp), %eax
xorl $-2147483648, (%eax)
ret
_xor:
lwz r2, 0(r3)
xoris r2, r2, 32768
stw r2, 0(r3)
blr
instead of (X86/PPC):
_xor:
movl 4(%esp), %eax
movl (%eax), %ecx
movl %ecx, %edx
shrl $31, %edx
# TRUNCATE movb %dl, %dl
xorb $1, %dl
movzbl %dl, %edx
andl $2147483647, %ecx
shll $31, %edx
orl %ecx, %edx
movl %edx, (%eax)
ret
_xor:
lwz r2, 0(r3)
srwi r4, r2, 31
xori r4, r4, 1
rlwimi r2, r4, 31, 0, 0
stw r2, 0(r3)
blr
This implements InstCombine/cast.ll:test30.
llvm-svn: 28273
|
|
|
|
|
|
|
|
|
|
|
| |
When doing the initial pass of constant folding, if we get a constantexpr,
simplify the constant expr like we would do if the constant is folded in the
normal loop.
This fixes the missed-optimization regression in
Transforms/InstCombine/getelementptr.ll last night.
llvm-svn: 28224
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Implement InstCombine/deadcode.ll by not adding instructions in unreachable
blocks (due to constants in conditional branches/switches) to the worklist.
This causes them to be deleted before instcombine starts up, leading to
better optimization.
2. In the prepass over instructions, do trivial constprop/dce as we go. This
has the effect of improving the effectiveness of #1. In addition, it
*significantly* speeds up instcombine on test cases with large amounts of
constant folding code (for example, that produced by code specialization
or partial evaluation). In one example, it speeds up instcombine from
0.0589s to 0.0224s with a release build (a 2.6x speedup).
llvm-svn: 28215
|
|
|
|
|
|
|
|
|
|
| |
Make the "fold (and (cast A), (cast B)) -> (cast (and A, B))" transformation
only apply when both casts really will cause code to be generated. If one or
both doesn't, then this xform doesn't remove a cast.
This fixes Transforms/InstCombine/2006-05-06-Infloop.ll
llvm-svn: 28141
|
|
|
|
| |
llvm-svn: 28128
|
|
|
|
| |
llvm-svn: 28126
|
|
|
|
| |
llvm-svn: 28101
|
|
|
|
| |
llvm-svn: 28019
|
|
|
|
|
|
| |
Transforms/InstCombine/vec_insert_to_shuffle.ll
llvm-svn: 27997
|
|
|
|
| |
llvm-svn: 27881
|
|
|
|
|
|
| |
can be converted to losslessly, we can continue the conversion to a direct call.
llvm-svn: 27880
|
|
|
|
|
|
| |
if the pointer is known aligned.
llvm-svn: 27781
|
|
|
|
|
|
|
|
| |
Make the insert/extract elt -> shuffle code more aggressive.
This fixes CodeGen/PowerPC/vec_shuffle.ll
llvm-svn: 27728
|
|
|
|
| |
llvm-svn: 27727
|
|
|
|
|
|
| |
maximal shuffles out of them where possible.
llvm-svn: 27717
|
|
|
|
|
|
|
|
| |
aggressive in some cases where LLVMGCC 4 is inserting casts for no reason.
This implements InstCombine/cast.ll:test27/28.
llvm-svn: 27620
|
|
|
|
| |
llvm-svn: 27573
|
|
|
|
| |
llvm-svn: 27571
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
us to compile oh-so-realistic stuff like this:
vec_vperm(A, B, (vector unsigned char){14});
to:
vspltb v0, v0, 14
instead of:
vspltisb v0, 14
vperm v0, v2, v1, v0
llvm-svn: 27452
|
|
|
|
|
|
|
|
|
|
|
| |
%tmp = cast <4 x uint> %tmp to <4 x int> ; <<4 x int>> [#uses=1]
%tmp = cast <4 x int> %tmp to <4 x float> ; <<4 x float>> [#uses=1]
into:
%tmp = cast <4 x uint> %tmp to <4 x float> ; <<4 x float>> [#uses=1]
llvm-svn: 27355
|
|
|
|
|
|
|
|
|
|
|
|
| |
%tmp = cast <4 x uint>* %testData to <4 x int>* ; <<4 x int>*> [#uses=1]
%tmp = load <4 x int>* %tmp ; <<4 x int>> [#uses=1]
to this:
%tmp = load <4 x uint>* %testData ; <<4 x uint>> [#uses=1]
%tmp = cast <4 x uint> %tmp to <4 x int> ; <<4 x int>> [#uses=1]
llvm-svn: 27353
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
elimination of one load from this:
int AreSecondAndThirdElementsBothNegative( vector float *in ) {
#define QNaN 0x7FC00000
const vector unsigned int testData = (vector unsigned int)( QNaN, 0, 0, QNaN );
vector float test = vec_ld( 0, (float*) &testData );
return ! vec_any_ge( test, *in );
}
Now generating:
_AreSecondAndThirdElementsBothNegative:
mfspr r2, 256
oris r4, r2, 49152
mtspr 256, r4
li r4, lo16(LCPI1_0)
lis r5, ha16(LCPI1_0)
addi r6, r1, -16
lvx v0, r5, r4
stvx v0, 0, r6
lvx v1, 0, r3
vcmpgefp. v0, v0, v1
mfcr r3, 2
rlwinm r3, r3, 27, 31, 31
xori r3, r3, 1
cntlzw r3, r3
srwi r3, r3, 5
mtspr 256, r2
blr
llvm-svn: 27352
|
|
|
|
| |
llvm-svn: 27330
|
|
|
|
|
|
|
|
| |
Fold (B&A)^A == ~B & A
This implements InstCombine/xor.ll:test2[56]
llvm-svn: 27328
|
|
|
|
|
|
| |
extract_element'd value, do so.
llvm-svn: 27323
|
|
|
|
| |
llvm-svn: 27300
|
|
|
|
| |
llvm-svn: 27261
|
|
|
|
| |
llvm-svn: 27125
|
|
|
|
| |
llvm-svn: 26992
|
|
|
|
| |
llvm-svn: 26580
|
|
|
|
|
|
|
|
|
|
|
|
| |
the pointer is known to come from either a global variable, alloca or
malloc. This allows us to compile this:
P = malloc(28);
memset(P, 0, 28);
into explicit stores on PPC instead of a memset call.
llvm-svn: 26577
|
|
|
|
|
|
|
| |
Transforms/InstCombine/vec_narrow.ll. This add support for narrowing
extract_element(insertelement) also.
llvm-svn: 26538
|
|
|
|
|
|
| |
This implements Transforms/InstCombine/add.ll:test31
llvm-svn: 26519
|
|
|
|
| |
llvm-svn: 26484
|
|
|
|
|
|
| |
pointed out: realize the AND can provide factors and look through Casts.
llvm-svn: 26469
|
|
|
|
|
|
| |
Transforms/InstCombine/2006-02-28-Crash.ll
llvm-svn: 26427
|
|
|
|
| |
llvm-svn: 26415
|
|
|
|
| |
llvm-svn: 26413
|