| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
Remove some done items from the todo list.
llvm-svn: 27729
|
| |
|
|
| |
llvm-svn: 27726
|
| |
|
|
| |
llvm-svn: 27714
|
| |
|
|
|
|
| |
separate functions, for simplicity and code clarity.
llvm-svn: 27693
|
| |
|
|
|
|
| |
functions, which makes the code much cleaner :)
llvm-svn: 27692
|
| |
|
|
|
|
| |
tested by CodeGen/Generic/vector.ll
llvm-svn: 27657
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
different types.
Codegen spltw(0x7FFFFFFF) and spltw(0x80000000) without a constant pool load,
implementing PowerPC/vec_constants.ll:test1. This compiles:
typedef float vf __attribute__ ((vector_size (16)));
typedef int vi __attribute__ ((vector_size (16)));
void test(vi *P1, vi *P2, vf *P3) {
*P1 &= (vi){0x80000000,0x80000000,0x80000000,0x80000000};
*P2 &= (vi){0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF};
*P3 = vec_abs((vector float)*P3);
}
to:
_test:
mfspr r2, 256
oris r6, r2, 49152
mtspr 256, r6
vspltisw v0, -1
vslw v0, v0, v0
lvx v1, 0, r3
vand v1, v1, v0
stvx v1, 0, r3
lvx v1, 0, r4
vandc v1, v1, v0
stvx v1, 0, r4
lvx v1, 0, r5
vandc v0, v1, v0
stvx v0, 0, r5
mtspr 256, r2
blr
instead of (with two constant pool entries):
_test:
mfspr r2, 256
oris r6, r2, 49152
mtspr 256, r6
li r6, lo16(LCPI1_0)
lis r7, ha16(LCPI1_0)
li r8, lo16(LCPI1_1)
lis r9, ha16(LCPI1_1)
lvx v0, r7, r6
lvx v1, 0, r3
vand v0, v1, v0
stvx v0, 0, r3
lvx v0, r9, r8
lvx v1, 0, r4
vand v1, v1, v0
stvx v1, 0, r4
lvx v1, 0, r5
vand v0, v1, v0
stvx v0, 0, r5
mtspr 256, r2
blr
GCC produces (with 2 cp entries):
_test:
mfspr r0,256
stw r0,-4(r1)
oris r0,r0,0xc00c
mtspr 256,r0
lis r2,ha16(LC0)
lis r9,ha16(LC1)
la r2,lo16(LC0)(r2)
lvx v0,0,r3
lvx v1,0,r5
la r9,lo16(LC1)(r9)
lwz r12,-4(r1)
lvx v12,0,r2
lvx v13,0,r9
vand v0,v0,v12
stvx v0,0,r3
vspltisw v0,-1
vslw v12,v0,v0
vandc v1,v1,v12
stvx v1,0,r5
lvx v0,0,r4
vand v0,v0,v13
stvx v0,0,r4
mtspr 256,r12
blr
llvm-svn: 27624
|
| |
|
|
|
|
|
|
|
| |
Canonicalize BUILD_VECTOR's that match VSPLTI's into a single type for each
form, eliminating a bunch of Pat patterns in the .td file and allowing us to
CSE stuff more aggressively. This implements
PowerPC/buildvec_canonicalize.ll:VSPLTI
llvm-svn: 27614
|
| |
|
|
|
|
| |
each other. This implements CodeGen/PowerPC/vxor-canonicalize.ll
llvm-svn: 27609
|
| |
|
|
|
|
| |
doesn't say where they go :-/
llvm-svn: 27579
|
| |
|
|
|
|
|
|
| |
stuff.
No functionality change.
llvm-svn: 27577
|
| |
|
|
| |
llvm-svn: 27544
|
| |
|
|
| |
llvm-svn: 27543
|
| |
|
|
|
|
| |
constant pool load.
llvm-svn: 27538
|
| |
|
|
|
|
| |
No functionality changes.
llvm-svn: 27536
|
| |
|
|
| |
llvm-svn: 27469
|
| |
|
|
|
|
| |
Convert vsldoi(x,x) to work the same way other (x,x) cases work.
llvm-svn: 27467
|
| |
|
|
| |
llvm-svn: 27463
|
| |
|
|
|
|
| |
shuffles.
llvm-svn: 27457
|
| |
|
|
|
|
|
| |
lower it and LLVM to have one fewer intrinsic. This implements
CodeGen/PowerPC/vec_shuffle.ll
llvm-svn: 27450
|
| |
|
|
|
|
| |
vperm with a perm mask lvx'd from the constant pool.
llvm-svn: 27448
|
| |
|
|
| |
llvm-svn: 27439
|
| |
|
|
| |
llvm-svn: 27433
|
| |
|
|
|
|
| |
{2147483647,2147483647,2147483647,2147483647} as 'vspltisb v0, -1'.
llvm-svn: 27413
|
| |
|
|
|
|
|
|
| |
handle all 4 PPC vector types. This simplifies the matching code and allows
us to eliminate a bunch of patterns. This also adds cases we were missing,
such as CodeGen/PowerPC/vec_splat.ll:splat_h.
llvm-svn: 27400
|
| |
|
|
| |
llvm-svn: 27386
|
| |
|
|
| |
llvm-svn: 27385
|
| |
|
|
| |
llvm-svn: 27359
|
| |
|
|
|
|
| |
"vspltisb v0, 8" instead of a constant pool load.
llvm-svn: 27335
|
| |
|
|
| |
llvm-svn: 27306
|
| |
|
|
| |
llvm-svn: 27305
|
| |
|
|
| |
llvm-svn: 27291
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
identical instructions into a single instruction. For example, for:
void test(vector float *x, vector float *y, int *P) {
int v = vec_any_out(*x, *y);
*x = (vector float)vec_cmpb(*x, *y);
*P = v;
}
we now generate:
_test:
mfspr r2, 256
oris r6, r2, 49152
mtspr 256, r6
lvx v0, 0, r4
lvx v1, 0, r3
vcmpbfp. v0, v1, v0
mfcr r4, 2
stvx v0, 0, r3
rlwinm r3, r4, 27, 31, 31
xori r3, r3, 1
stw r3, 0(r5)
mtspr 256, r2
blr
instead of:
_test:
mfspr r2, 256
oris r6, r2, 57344
mtspr 256, r6
lvx v0, 0, r4
lvx v1, 0, r3
vcmpbfp. v2, v1, v0
mfcr r4, 2
*** vcmpbfp v0, v1, v0
rlwinm r4, r4, 27, 31, 31
stvx v0, 0, r3
xori r3, r4, 1
stw r3, 0(r5)
mtspr 256, r2
blr
Testcase here: CodeGen/PowerPC/vcmp-fold.ll
llvm-svn: 27290
|
| |
|
|
|
|
| |
predicates to VCMPo nodes.
llvm-svn: 27285
|
| |
|
|
| |
llvm-svn: 27276
|
| |
|
|
| |
llvm-svn: 27215
|
| |
|
|
|
|
| |
same thing and we have a dag node for the former.
llvm-svn: 27205
|
| |
|
|
|
|
| |
value. Split them into separate enums.
llvm-svn: 27201
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
manner that the LowerSwitch LLVM to LLVM pass does: emitting a binary
search tree of basic blocks. The new approach has several advantages:
it is faster, it generates significantly smaller code in many cases, and
it paves the way for implementing dense switch tables as a jump table by
handling switches directly in the instruction selector.
This functionality is currently only enabled on x86, but should be safe for
every target. In anticipation of making it the default, the cfg is now
properly updated in the x86, ppc, and sparc select lowering code.
llvm-svn: 27156
|
| |
|
|
| |
llvm-svn: 27151
|
| |
|
|
| |
llvm-svn: 27149
|
| |
|
|
| |
llvm-svn: 27116
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
<int -1, int -1, int -1, int -1>
and
<int 65537, int 65537, int 65537, int 65537>
Using things like:
vspltisb v0, -1
and:
vspltish v0, 1
instead of using constant pool loads.
This implements CodeGen/PowerPC/vec_splat.ll:splat_imm_i{32|16}.
llvm-svn: 27106
|
| |
|
|
|
|
|
|
| |
comment.
This fixes 177.mesa, and McCat/09-vor with the td scheduler.
llvm-svn: 27060
|
| |
|
|
|
|
| |
Regression/CodeGen/PowerPC/vec_zero.ll
llvm-svn: 27059
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_foo2:
extsw r2, r3
std r2, -8(r1)
lfd f0, -8(r1)
fcfid f0, f0
frsp f1, f0
blr
instead of this:
_foo2:
lis r2, ha16(LCPI2_0)
lis r4, 17200
xoris r3, r3, 32768
stw r3, -4(r1)
stw r4, -8(r1)
lfs f0, lo16(LCPI2_0)(r2)
lfd f1, -8(r1)
fsub f0, f1, f0
frsp f1, f0
blr
This speeds up Misc/pi from 2.44s->2.09s with LLC and from 3.01->2.18s
with llcbeta (16.7% and 38.1% respectively).
llvm-svn: 26943
|
| |
|
|
| |
llvm-svn: 26930
|
| |
|
|
| |
llvm-svn: 26907
|
| |
|
|
|
|
| |
figuring these out! :)
llvm-svn: 26904
|
| |
|
|
|
|
|
|
| |
constant pool load. This generates significantly nicer code for splats.
When tblgen gets bugfixed, we can remove the custom selection code.
llvm-svn: 26898
|