| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 29614
|
| |
|
|
|
|
| |
getNode that takes a vector.
llvm-svn: 29609
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_func:
vsldoi v2, v3, v2, 12
vsldoi v2, v2, v2, 4
blr
instead of:
_func:
vsldoi v2, v3, v2, 12
vsldoi v2, v2, v2, 4
*** vor f1, v2, v2
blr
llvm-svn: 29607
|
| |
|
|
| |
llvm-svn: 29384
|
| |
|
|
| |
llvm-svn: 29307
|
| |
|
|
|
|
| |
into i16/i32 load/stores.
llvm-svn: 29089
|
| |
|
|
| |
llvm-svn: 28935
|
| |
|
|
| |
llvm-svn: 28932
|
| |
|
|
| |
llvm-svn: 28929
|
| |
|
|
| |
llvm-svn: 28927
|
| |
|
|
|
|
| |
doesn't work right).
llvm-svn: 28921
|
| |
|
|
| |
llvm-svn: 28919
|
| |
|
|
| |
llvm-svn: 28892
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
now compile:
static unsigned long X;
void test1() {
X = 0;
}
into:
_test1:
lis r2, ha16(_X)
li r3, 0
stw r3, lo16(_X)(r2)
blr
Totally amazing :)
llvm-svn: 28839
|
| |
|
|
|
|
| |
can in 32-bit mode we can choose to optionally *use* 64-bit registers.
llvm-svn: 28824
|
| |
|
|
| |
llvm-svn: 28797
|
| |
|
|
|
|
|
|
|
| |
as using incoming argument registers, so the local allocator would clobber them
between their set and use. To fix this, we give the call instructions a variable
number of uses in the CALL MachineInstr itself, so live variables understands
the live ranges of these register arguments.
llvm-svn: 28744
|
| |
|
|
|
|
| |
space will be available, even if the callee isn't varargs.
llvm-svn: 28571
|
| |
|
|
|
|
| |
RET chain, value1, sign1, value2, sign2, ...
llvm-svn: 28510
|
| |
|
|
| |
llvm-svn: 28462
|
| |
|
|
|
|
| |
by Anton Korobeynikov! This is a step towards closing PR786.
llvm-svn: 28447
|
| |
|
|
| |
llvm-svn: 28445
|
| |
|
|
|
|
| |
change. This fixes PowerPC/fnegsel.ll.
llvm-svn: 28443
|
| |
|
|
|
|
| |
enough to be autogenerated.
llvm-svn: 28354
|
| |
|
|
|
|
|
|
| |
the copyto/fromregs instead of making the PPCISD::CALL selection code create
them. This vastly simplifies the selection code, and moves the ABI handling
parts into one place.
llvm-svn: 28346
|
| |
|
|
|
|
|
|
|
|
| |
1. Rearrange code a bit so that the special case doesn't require indenting lots
of code.
2. Add comments describing PPC calling convention.
3. Only round up to 56-bytes of stack space for an outgoing call if the callee
is varargs. This saves a bit of stack space.
llvm-svn: 28342
|
| |
|
|
| |
llvm-svn: 28341
|
| |
|
|
|
|
|
|
| |
ISD::CALL node, then custom lower that. This means that we only have to handle
LEGAL call operands/results, not every possible type. This allows us to
simplify the call code, shrinking it by about 1/3.
llvm-svn: 28339
|
| |
|
|
| |
llvm-svn: 28335
|
| |
|
|
| |
llvm-svn: 28334
|
| |
|
|
| |
llvm-svn: 28333
|
| |
|
|
|
|
|
|
|
|
| |
handling. This makes the lower argument code significantly simpler (we
only need to handle legal argument types).
Incidentally, this also implements support for vector argument registers,
so long as they are not on the stack.
llvm-svn: 28331
|
| |
|
|
| |
llvm-svn: 28311
|
| |
|
|
| |
llvm-svn: 28264
|
| |
|
|
| |
llvm-svn: 28017
|
| |
|
|
|
|
|
|
| |
x86 and ppc for 100% dense switch statements when relocations are non-PIC.
This support will be extended and enhanced in the coming days to support
PIC, and less dense forms of jump tables.
llvm-svn: 27947
|
| |
|
|
|
|
|
|
|
|
|
| |
void foo2(vector float *A, vector float *B) {
vector float C = (vector float)vec_cmpeq(*A, *B);
if (!vec_any_eq(*A, *B))
*B = (vector float){0,0,0,0};
*A = C;
}
llvm-svn: 27808
|
| |
|
|
| |
llvm-svn: 27806
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an altivec predicate compare is used immediately by a branch, don't
use a (serializing) MFCR instruction to read the CR6 register, which requires
a compare to get it back to CR's. Instead, just branch on CR6 directly. :)
For example, for:
void foo2(vector float *A, vector float *B) {
if (!vec_any_eq(*A, *B))
*B = (vector float){0,0,0,0};
}
We now generate:
_foo2:
mfspr r2, 256
oris r5, r2, 12288
mtspr 256, r5
lvx v2, 0, r4
lvx v3, 0, r3
vcmpeqfp. v2, v3, v2
bne cr6, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
vxor v2, v2, v2
stvx v2, 0, r4
mtspr 256, r2
blr
LBB1_2: ; UnifiedReturnBlock
mtspr 256, r2
blr
instead of:
_foo2:
mfspr r2, 256
oris r5, r2, 12288
mtspr 256, r5
lvx v2, 0, r4
lvx v3, 0, r3
vcmpeqfp. v2, v3, v2
mfcr r3, 2
rlwinm r3, r3, 27, 31, 31
cmpwi cr0, r3, 0
beq cr0, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
vxor v2, v2, v2
stvx v2, 0, r4
mtspr 256, r2
blr
LBB1_2: ; UnifiedReturnBlock
mtspr 256, r2
blr
This implements CodeGen/PowerPC/vec_br_cmp.ll.
llvm-svn: 27804
|
| |
|
|
|
|
| |
even/odd halves. Thanks to Nate telling me what's what.
llvm-svn: 27793
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vmuloub v5, v3, v2
vmuleub v2, v3, v2
vperm v2, v2, v5, v4
This implements CodeGen/PowerPC/vec_mul.ll. With this, v16i8 multiplies are
6.79x faster than before.
Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with
GCC.
Remove the 'integer multiplies' todo from the README file.
llvm-svn: 27792
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
li r5, lo16(LCPI1_0)
lis r6, ha16(LCPI1_0)
lvx v4, r6, r5
vmulouh v5, v3, v2
vmuleuh v2, v3, v2
vperm v2, v2, v5, v4
where v4 is:
LCPI1_0: ; <16 x ubyte>
.byte 2
.byte 3
.byte 18
.byte 19
.byte 6
.byte 7
.byte 22
.byte 23
.byte 10
.byte 11
.byte 26
.byte 27
.byte 14
.byte 15
.byte 30
.byte 31
This is 5.07x faster on the G5 (measured) than lowering to scalar code +
loads/stores.
llvm-svn: 27789
|
| |
|
|
|
|
|
|
|
| |
scalarize the sequence into 4 mullw's and a bunch of load/store traffic.
This speeds up v4i32 multiplies 4.1x (measured) on a G5. This implements
PowerPC/vec_mul.ll
llvm-svn: 27788
|
| |
|
|
|
|
| |
being a bit more clever, add support for odd splats from -31 to -17.
llvm-svn: 27764
|
| |
|
|
|
|
| |
This implements vec_constants.ll:test_vsldoi and test_rol
llvm-svn: 27760
|
| |
|
|
|
|
| |
new patterns.
llvm-svn: 27754
|
| |
|
|
|
|
| |
PowerPC/vec_constants.ll:test_29.
llvm-svn: 27752
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Effeciently codegen even splats in the range [-32,30].
This allows us to codegen <30,30,30,30> as:
vspltisw v0, 15
vadduwm v2, v0, v0
instead of as a cp load.
llvm-svn: 27750
|
| |
|
|
|
|
|
| |
if it can be implemented in 3 or fewer discrete altivec instructions, codegen
it as such. This implements Regression/CodeGen/PowerPC/vec_perf_shuffle.ll
llvm-svn: 27748
|
| |
|
|
|
|
|
| |
one type (v4i32) so that we don't have to write patterns for each type, and
so that more CSE opportunities are exposed.
llvm-svn: 27731
|