| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 27819
|
| |
|
|
|
|
| |
- PINSRWrmi encoding bug.
llvm-svn: 27818
|
| |
|
|
| |
llvm-svn: 27817
|
| |
|
|
| |
llvm-svn: 27816
|
| |
|
|
| |
llvm-svn: 27815
|
| |
|
|
| |
llvm-svn: 27814
|
| |
|
|
| |
llvm-svn: 27813
|
| |
|
|
| |
llvm-svn: 27812
|
| |
|
|
| |
llvm-svn: 27811
|
| |
|
|
| |
llvm-svn: 27810
|
| |
|
|
| |
llvm-svn: 27809
|
| |
|
|
|
|
|
|
|
|
|
| |
void foo2(vector float *A, vector float *B) {
vector float C = (vector float)vec_cmpeq(*A, *B);
if (!vec_any_eq(*A, *B))
*B = (vector float){0,0,0,0};
*A = C;
}
llvm-svn: 27808
|
| |
|
|
| |
llvm-svn: 27807
|
| |
|
|
| |
llvm-svn: 27806
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an altivec predicate compare is used immediately by a branch, don't
use a (serializing) MFCR instruction to read the CR6 register, which requires
a compare to get it back to CR's. Instead, just branch on CR6 directly. :)
For example, for:
void foo2(vector float *A, vector float *B) {
if (!vec_any_eq(*A, *B))
*B = (vector float){0,0,0,0};
}
We now generate:
_foo2:
mfspr r2, 256
oris r5, r2, 12288
mtspr 256, r5
lvx v2, 0, r4
lvx v3, 0, r3
vcmpeqfp. v2, v3, v2
bne cr6, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
vxor v2, v2, v2
stvx v2, 0, r4
mtspr 256, r2
blr
LBB1_2: ; UnifiedReturnBlock
mtspr 256, r2
blr
instead of:
_foo2:
mfspr r2, 256
oris r5, r2, 12288
mtspr 256, r5
lvx v2, 0, r4
lvx v3, 0, r3
vcmpeqfp. v2, v3, v2
mfcr r3, 2
rlwinm r3, r3, 27, 31, 31
cmpwi cr0, r3, 0
beq cr0, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
vxor v2, v2, v2
stvx v2, 0, r4
mtspr 256, r2
blr
LBB1_2: ; UnifiedReturnBlock
mtspr 256, r2
blr
This implements CodeGen/PowerPC/vec_br_cmp.ll.
llvm-svn: 27804
|
| |
|
|
| |
llvm-svn: 27803
|
| |
|
|
| |
llvm-svn: 27802
|
| |
|
|
|
|
| |
to optimize cases where it has to spill a lot
llvm-svn: 27801
|
| |
|
|
| |
llvm-svn: 27800
|
| |
|
|
| |
llvm-svn: 27799
|
| |
|
|
|
|
|
|
|
|
| |
directories if it can't find them. Then, replace those values into the
configure.ac script and pass them to the LLVM_CONFIG_PROJECT so that the
values become the default for llvm_src and llvm_obj variables. In this way
the user is required to input this exactly once, and the scripts take it
from there.
llvm-svn: 27798
|
| |
|
|
|
|
|
| |
the arguments to the macro. This better supports the AutoRegen.sh script
in projects/sample/autoconf.
llvm-svn: 27797
|
| |
|
|
| |
llvm-svn: 27796
|
| |
|
|
| |
llvm-svn: 27795
|
| |
|
|
| |
llvm-svn: 27794
|
| |
|
|
|
|
| |
even/odd halves. Thanks to Nate telling me what's what.
llvm-svn: 27793
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vmuloub v5, v3, v2
vmuleub v2, v3, v2
vperm v2, v2, v5, v4
This implements CodeGen/PowerPC/vec_mul.ll. With this, v16i8 multiplies are
6.79x faster than before.
Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with
GCC.
Remove the 'integer multiplies' todo from the README file.
llvm-svn: 27792
|
| |
|
|
| |
llvm-svn: 27791
|
| |
|
|
| |
llvm-svn: 27790
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
li r5, lo16(LCPI1_0)
lis r6, ha16(LCPI1_0)
lvx v4, r6, r5
vmulouh v5, v3, v2
vmuleuh v2, v3, v2
vperm v2, v2, v5, v4
where v4 is:
LCPI1_0: ; <16 x ubyte>
.byte 2
.byte 3
.byte 18
.byte 19
.byte 6
.byte 7
.byte 22
.byte 23
.byte 10
.byte 11
.byte 26
.byte 27
.byte 14
.byte 15
.byte 30
.byte 31
This is 5.07x faster on the G5 (measured) than lowering to scalar code +
loads/stores.
llvm-svn: 27789
|
| |
|
|
|
|
|
|
|
| |
scalarize the sequence into 4 mullw's and a bunch of load/store traffic.
This speeds up v4i32 multiplies 4.1x (measured) on a G5. This implements
PowerPC/vec_mul.ll
llvm-svn: 27788
|
| |
|
|
| |
llvm-svn: 27787
|
| |
|
|
| |
llvm-svn: 27786
|
| |
|
|
| |
llvm-svn: 27785
|
| |
|
|
| |
llvm-svn: 27784
|
| |
|
|
| |
llvm-svn: 27782
|
| |
|
|
|
|
| |
if the pointer is known aligned.
llvm-svn: 27781
|
| |
|
|
|
|
| |
the code in GCC PR26546.
llvm-svn: 27780
|
| |
|
|
| |
llvm-svn: 27779
|
| |
|
|
| |
llvm-svn: 27778
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allows us to codegen functions as:
_test_rol:
vspltisw v2, -12
vrlw v2, v2, v2
blr
instead of:
_test_rol:
mfvrsave r2, 256
mr r3, r2
mtvrsave r3
vspltisw v2, -12
vrlw v2, v2, v2
mtvrsave r2
blr
Testcase here: CodeGen/PowerPC/vec_vrsave.ll
llvm-svn: 27777
|
| |
|
|
| |
llvm-svn: 27776
|
| |
|
|
| |
llvm-svn: 27775
|
| |
|
|
| |
llvm-svn: 27774
|
| |
|
|
| |
llvm-svn: 27773
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the vrsave register for the caller. This allows us to codegen a function as:
_test_rol:
mfspr r2, 256
mr r3, r2
mtspr 256, r3
vspltisw v2, -12
vrlw v2, v2, v2
mtspr 256, r2
blr
instead of:
_test_rol:
mfspr r2, 256
oris r3, r2, 40960
mtspr 256, r3
vspltisw v0, -12
vrlw v2, v0, v0
mtspr 256, r2
blr
llvm-svn: 27772
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
vspltisw v2, -12
vrlw v2, v2, v2
instead of:
vspltisw v0, -12
vrlw v2, v0, v0
when a function is returning a value.
llvm-svn: 27771
|
| |
|
|
|
|
| |
register info.
llvm-svn: 27770
|
| |
|
|
| |
llvm-svn: 27769
|
| |
|
|
| |
llvm-svn: 27768
|