| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
Olden/mst, Ptrdist/bc, Obsequi, etc.
llvm-svn: 21274
|
| |
|
|
| |
llvm-svn: 21267
|
| |
|
|
|
|
| |
and faster.
llvm-svn: 21181
|
| |
|
|
| |
llvm-svn: 21164
|
| |
|
|
| |
llvm-svn: 21146
|
| |
|
|
| |
llvm-svn: 21107
|
| |
|
|
| |
llvm-svn: 21015
|
| |
|
|
| |
llvm-svn: 21010
|
| |
|
|
| |
llvm-svn: 21005
|
| |
|
|
| |
llvm-svn: 20990
|
| |
|
|
| |
llvm-svn: 20936
|
| |
|
|
| |
llvm-svn: 20842
|
| |
|
|
|
|
| |
request.
llvm-svn: 20804
|
| |
|
|
| |
llvm-svn: 20651
|
| |
|
|
|
|
|
|
| |
using Function::arg_{iterator|begin|end}. Likewise Module::g* -> Module::global_*.
This patch is contributed by Gabor Greif, thanks!
llvm-svn: 20597
|
| |
|
|
| |
llvm-svn: 20284
|
| |
|
|
| |
llvm-svn: 19835
|
| |
|
|
|
|
| |
of FP ops.
llvm-svn: 19834
|
| |
|
|
| |
llvm-svn: 19798
|
| |
|
|
| |
llvm-svn: 19733
|
| |
|
|
| |
llvm-svn: 19731
|
| |
|
|
| |
llvm-svn: 19728
|
| |
|
|
|
|
| |
fixes most of the remaining llc-beta failures.
llvm-svn: 19716
|
| |
|
|
| |
llvm-svn: 19711
|
| |
|
|
|
|
|
|
| |
pressure, not decreases register pressure. Fix problem where we accidentally
swapped the operands of SHLD, which caused fourinarow to fail. This fixes
fourinarow.
llvm-svn: 19697
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
typically cost 1 cycle) instead of shld/shrd instruction (which are typically
6 or more cycles). This also saves code space.
For example, instead of emitting:
rotr:
mov %EAX, DWORD PTR [%ESP + 4]
mov %CL, BYTE PTR [%ESP + 8]
shrd %EAX, %EAX, %CL
ret
rotli:
mov %EAX, DWORD PTR [%ESP + 4]
shrd %EAX, %EAX, 27
ret
Emit:
rotr32:
mov %CL, BYTE PTR [%ESP + 8]
mov %EAX, DWORD PTR [%ESP + 4]
ror %EAX, %CL
ret
rotli32:
mov %EAX, DWORD PTR [%ESP + 4]
ror %EAX, 27
ret
We also emit byte rotate instructions which do not have a sh[lr]d counterpart
at all.
llvm-svn: 19692
|
| |
|
|
| |
llvm-svn: 19689
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
shrd %EAX, %EDX, 2
sar %EDX, 2
ret
instead of this:
test1:
mov %ECX, DWORD PTR [%ESP + 4]
shr %ECX, 2
mov %EDX, DWORD PTR [%ESP + 8]
mov %EAX, %EDX
shl %EAX, 30
or %EAX, %ECX
sar %EDX, 2
ret
and long << 2 to this:
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
*** mov %EDX, %EAX
shrd %EDX, %ECX, 30
shl %EAX, 2
ret
instead of this:
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
shr %ECX, 30
mov %EDX, DWORD PTR [%ESP + 8]
shl %EDX, 2
or %EDX, %ECX
shl %EAX, 2
ret
The extra copy (marked ***) can be eliminated when I teach the code generator
that shrd32rri8 is really commutative.
llvm-svn: 19681
|
| |
|
|
| |
llvm-svn: 19678
|
| |
|
|
|
|
| |
FP_EXTEND from!
llvm-svn: 19674
|
| |
|
|
| |
llvm-svn: 19673
|
| |
|
|
|
|
| |
don't need to even think about F32 in the X86 code anymore.
llvm-svn: 19672
|
| |
|
|
| |
llvm-svn: 19667
|
| |
|
|
| |
llvm-svn: 19661
|
| |
|
|
| |
llvm-svn: 19659
|
| |
|
|
|
|
|
|
| |
* Insert some really pedantic assertions that will notice when we emit the
same loads more than one time, exposing bugs. This turns a miscompilation in
bzip2 into a compile-fail. yaay.
llvm-svn: 19658
|
| |
|
|
|
|
|
| |
match (X+Y)+(Z << 1), because we match the X+Y first, consuming the index
register, then there is no place to put the Z.
llvm-svn: 19652
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
emitted too early. In particular, this fixes
Regression/CodeGen/X86/regpressure.ll:regpressure3.
This also improves the 2nd basic block in 164.gzip:flush_block, which went from
.LBBflush_block_1: # loopentry.1.i
movzx %EAX, WORD PTR [dyn_ltree + 20]
movzx %ECX, WORD PTR [dyn_ltree + 16]
mov DWORD PTR [%ESP + 32], %ECX
movzx %ECX, WORD PTR [dyn_ltree + 12]
movzx %EDX, WORD PTR [dyn_ltree + 8]
movzx %EBX, WORD PTR [dyn_ltree + 4]
mov DWORD PTR [%ESP + 36], %EBX
movzx %EBX, WORD PTR [dyn_ltree]
add DWORD PTR [%ESP + 36], %EBX
add %EDX, DWORD PTR [%ESP + 36]
add %ECX, %EDX
add DWORD PTR [%ESP + 32], %ECX
add %EAX, DWORD PTR [%ESP + 32]
movzx %ECX, WORD PTR [dyn_ltree + 24]
add %EAX, %ECX
mov %ECX, 0
mov %EDX, %ECX
to
.LBBflush_block_1: # loopentry.1.i
movzx %EAX, WORD PTR [dyn_ltree]
movzx %ECX, WORD PTR [dyn_ltree + 4]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 8]
add %EAX, %ECX
movzx %ECX, WORD PTR [dyn_ltree + 12]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 16]
add %EAX, %ECX
movzx %ECX, WORD PTR [dyn_ltree + 20]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 24]
add %ECX, %EAX
mov %EAX, 0
mov %EDX, %EAX
... which results in less spilling in the function.
This change alone speeds up 164.gzip from 37.23s to 36.24s on apoc. The
default isel takes 37.31s.
llvm-svn: 19650
|
| |
|
|
| |
llvm-svn: 19647
|
| |
|
|
| |
llvm-svn: 19645
|
| |
|
|
|
|
|
| |
1. Fold [mem] += (1|-1) into inc [mem]/dec [mem] to save some icache space.
2. Do not let token factor nodes prevent forming '[mem] op= val' folds.
llvm-svn: 19643
|
| |
|
|
| |
llvm-svn: 19641
|
| |
|
|
|
|
|
|
| |
operations.
The body of the if is less indented but unmodified in this patch.
llvm-svn: 19638
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
int %foo(int %X) {
%T = add int %X, 13
%S = mul int %T, 3
ret int %S
}
as this:
mov %ECX, DWORD PTR [%ESP + 4]
lea %EAX, DWORD PTR [%ECX + 2*%ECX + 39]
ret
instead of this:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, %ECX
add %EAX, 13
imul %EAX, %EAX, 3
ret
llvm-svn: 19633
|
| |
|
|
|
|
|
|
| |
Do not fold a load into an operation if it will induce a cycle in the DAG.
Repeat after me: dAg.
llvm-svn: 19631
|
| |
|
|
|
|
|
| |
The comparison will probably be folded, so this is not ok to do.
This fixed 197.parser.
llvm-svn: 19624
|
| |
|
|
|
|
| |
of the bytereg. This fixes yacr2, 300.twolf and probably others.
llvm-svn: 19622
|
| |
|
|
|
|
|
| |
If we emit a load because we followed a token chain to get to it, try to
fold it into its single user if possible.
llvm-svn: 19620
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* Remove custom promotion for bool and byte select ops. Legalize now
promotes them for us.
* Allow folding ConstantPoolIndexes into EXTLOAD's, useful for float immediates.
* Declare which operations are not supported better.
* Add some hacky code for TRUNCSTORE to pretend that we have truncstore
for i16 types. This is useful for testing promotion code because I can
just remove 16-bit registers all together and verify that programs work.
llvm-svn: 19614
|
| |
|
|
| |
llvm-svn: 19566
|