| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 19714
|
| |
|
|
| |
llvm-svn: 19712
|
| |
|
|
| |
llvm-svn: 19711
|
| |
|
|
| |
llvm-svn: 19707
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The second folds operations into selects, e.g. (select C, (X+Y), (Y+Z))
-> (Y+(select C, X, Z)
This occurs a few times across spec, e.g.
select add/sub
mesa: 83 0
povray: 5 2
gcc 4 2
parser 0 22
perlbmk 13 30
twolf 0 3
llvm-svn: 19706
|
| |
|
|
| |
llvm-svn: 19704
|
| |
|
|
| |
llvm-svn: 19703
|
| |
|
|
| |
llvm-svn: 19701
|
| |
|
|
|
|
| |
independent of each other.
llvm-svn: 19700
|
| |
|
|
| |
llvm-svn: 19699
|
| |
|
|
| |
llvm-svn: 19698
|
| |
|
|
|
|
|
|
| |
pressure, not decreases register pressure. Fix problem where we accidentally
swapped the operands of SHLD, which caused fourinarow to fail. This fixes
fourinarow.
llvm-svn: 19697
|
| |
|
|
| |
llvm-svn: 19696
|
| |
|
|
|
|
|
| |
well as all of teh other stuff in livevar. This fixes the compiler crash
on fourinarow last night.
llvm-svn: 19695
|
| |
|
|
| |
llvm-svn: 19694
|
| |
|
|
| |
llvm-svn: 19693
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
typically cost 1 cycle) instead of shld/shrd instruction (which are typically
6 or more cycles). This also saves code space.
For example, instead of emitting:
rotr:
mov %EAX, DWORD PTR [%ESP + 4]
mov %CL, BYTE PTR [%ESP + 8]
shrd %EAX, %EAX, %CL
ret
rotli:
mov %EAX, DWORD PTR [%ESP + 4]
shrd %EAX, %EAX, 27
ret
Emit:
rotr32:
mov %CL, BYTE PTR [%ESP + 8]
mov %EAX, DWORD PTR [%ESP + 4]
ror %EAX, %CL
ret
rotli32:
mov %EAX, DWORD PTR [%ESP + 4]
ror %EAX, 27
ret
We also emit byte rotate instructions which do not have a sh[lr]d counterpart
at all.
llvm-svn: 19692
|
| |
|
|
| |
llvm-svn: 19690
|
| |
|
|
| |
llvm-svn: 19689
|
| |
|
|
| |
llvm-svn: 19687
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to generate this:
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
shld %EDX, %EDX, 2
shl %EAX, 2
ret
instead of this:
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
mov %EDX, %EAX
shrd %EDX, %ECX, 30
shl %EAX, 2
ret
Note the magically transmogrifying immediate.
llvm-svn: 19686
|
| |
|
|
|
|
| |
instead of doing it manually.
llvm-svn: 19685
|
| |
|
|
|
|
|
| |
Add default impl of commuteInstruction
Add notes about ugly V9 code.
llvm-svn: 19684
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
shrd %EAX, %EDX, 2
sar %EDX, 2
ret
instead of this:
test1:
mov %ECX, DWORD PTR [%ESP + 4]
shr %ECX, 2
mov %EDX, DWORD PTR [%ESP + 8]
mov %EAX, %EDX
shl %EAX, 30
or %EAX, %ECX
sar %EDX, 2
ret
and long << 2 to this:
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
*** mov %EDX, %EAX
shrd %EDX, %ECX, 30
shl %EAX, 2
ret
instead of this:
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
shr %ECX, 30
mov %EDX, DWORD PTR [%ESP + 8]
shl %EDX, 2
or %EDX, %ECX
shl %EAX, 2
ret
The extra copy (marked ***) can be eliminated when I teach the code generator
that shrd32rri8 is really commutative.
llvm-svn: 19681
|
| |
|
|
|
|
|
|
| |
select operations or to shifts that are by a constant. This automatically
implements (with no special code) all of the special cases for shift by 32,
shift by < 32 and shift by > 32.
llvm-svn: 19679
|
| |
|
|
| |
llvm-svn: 19678
|
| |
|
|
|
|
|
|
|
| |
range. Either they are undefined (the default), they mask the shift amount
to the size of the register (X86, Alpha, etc), or they extend the shift (PPC).
This defaults to undefined, which is conservatively correct.
llvm-svn: 19677
|
| |
|
|
| |
llvm-svn: 19675
|
| |
|
|
|
|
| |
FP_EXTEND from!
llvm-svn: 19674
|
| |
|
|
| |
llvm-svn: 19673
|
| |
|
|
|
|
| |
don't need to even think about F32 in the X86 code anymore.
llvm-svn: 19672
|
| |
|
|
|
|
| |
of zero and sign extends.
llvm-svn: 19671
|
| |
|
|
| |
llvm-svn: 19670
|
| |
|
|
|
|
|
|
|
| |
do it. This results in better code on X86 for floats (because if strict
precision is not required, we can elide some more expensive double -> float
conversions like the old isel did), and allows other targets to emit
CopyFromRegs that are not legal for arguments.
llvm-svn: 19668
|
| |
|
|
| |
llvm-svn: 19667
|
| |
|
|
| |
llvm-svn: 19661
|
| |
|
|
| |
llvm-svn: 19660
|
| |
|
|
| |
llvm-svn: 19659
|
| |
|
|
|
|
|
|
| |
* Insert some really pedantic assertions that will notice when we emit the
same loads more than one time, exposing bugs. This turns a miscompilation in
bzip2 into a compile-fail. yaay.
llvm-svn: 19658
|
| |
|
|
| |
llvm-svn: 19657
|
| |
|
|
| |
llvm-svn: 19656
|
| |
|
|
|
|
|
| |
match (X+Y)+(Z << 1), because we match the X+Y first, consuming the index
register, then there is no place to put the Z.
llvm-svn: 19652
|
| |
|
|
| |
llvm-svn: 19651
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
emitted too early. In particular, this fixes
Regression/CodeGen/X86/regpressure.ll:regpressure3.
This also improves the 2nd basic block in 164.gzip:flush_block, which went from
.LBBflush_block_1: # loopentry.1.i
movzx %EAX, WORD PTR [dyn_ltree + 20]
movzx %ECX, WORD PTR [dyn_ltree + 16]
mov DWORD PTR [%ESP + 32], %ECX
movzx %ECX, WORD PTR [dyn_ltree + 12]
movzx %EDX, WORD PTR [dyn_ltree + 8]
movzx %EBX, WORD PTR [dyn_ltree + 4]
mov DWORD PTR [%ESP + 36], %EBX
movzx %EBX, WORD PTR [dyn_ltree]
add DWORD PTR [%ESP + 36], %EBX
add %EDX, DWORD PTR [%ESP + 36]
add %ECX, %EDX
add DWORD PTR [%ESP + 32], %ECX
add %EAX, DWORD PTR [%ESP + 32]
movzx %ECX, WORD PTR [dyn_ltree + 24]
add %EAX, %ECX
mov %ECX, 0
mov %EDX, %ECX
to
.LBBflush_block_1: # loopentry.1.i
movzx %EAX, WORD PTR [dyn_ltree]
movzx %ECX, WORD PTR [dyn_ltree + 4]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 8]
add %EAX, %ECX
movzx %ECX, WORD PTR [dyn_ltree + 12]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 16]
add %EAX, %ECX
movzx %ECX, WORD PTR [dyn_ltree + 20]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 24]
add %ECX, %EAX
mov %EAX, 0
mov %EDX, %EAX
... which results in less spilling in the function.
This change alone speeds up 164.gzip from 37.23s to 36.24s on apoc. The
default isel takes 37.31s.
llvm-svn: 19650
|
| |
|
|
| |
llvm-svn: 19649
|
| |
|
|
| |
llvm-svn: 19647
|
| |
|
|
| |
llvm-svn: 19645
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86/reg-pressure.ll again, and allows us to do nice things in other cases.
For example, we now codegen this sort of thing:
int %loadload(int *%X, int* %Y) {
%Z = load int* %Y
%Y = load int* %X ;; load between %Z and store
%Q = add int %Z, 1
store int %Q, int* %Y
ret int %Y
}
Into this:
loadload:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%EAX]
mov %ECX, DWORD PTR [%ESP + 8]
inc DWORD PTR [%ECX]
ret
where we weren't able to form the 'inc [mem]' before. This also lets the
instruction selector emit loads in any order it wants to, which can be good
for register pressure as well.
llvm-svn: 19644
|
| |
|
|
|
|
|
| |
1. Fold [mem] += (1|-1) into inc [mem]/dec [mem] to save some icache space.
2. Do not let token factor nodes prevent forming '[mem] op= val' folds.
llvm-svn: 19643
|
| |
|
|
| |
llvm-svn: 19642
|