| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 19670
|
| |
|
|
| |
llvm-svn: 19669
|
| |
|
|
|
|
|
|
|
| |
do it. This results in better code on X86 for floats (because if strict
precision is not required, we can elide some more expensive double -> float
conversions like the old isel did), and allows other targets to emit
CopyFromRegs that are not legal for arguments.
llvm-svn: 19668
|
| |
|
|
| |
llvm-svn: 19667
|
| |
|
|
| |
llvm-svn: 19665
|
| |
|
|
| |
llvm-svn: 19664
|
| |
|
|
| |
llvm-svn: 19663
|
| |
|
|
| |
llvm-svn: 19662
|
| |
|
|
| |
llvm-svn: 19661
|
| |
|
|
| |
llvm-svn: 19660
|
| |
|
|
| |
llvm-svn: 19659
|
| |
|
|
|
|
|
|
| |
* Insert some really pedantic assertions that will notice when we emit the
same loads more than one time, exposing bugs. This turns a miscompilation in
bzip2 into a compile-fail. yaay.
llvm-svn: 19658
|
| |
|
|
| |
llvm-svn: 19657
|
| |
|
|
| |
llvm-svn: 19656
|
| |
|
|
| |
llvm-svn: 19655
|
| |
|
|
|
|
|
| |
match (X+Y)+(Z << 1), because we match the X+Y first, consuming the index
register, then there is no place to put the Z.
llvm-svn: 19652
|
| |
|
|
| |
llvm-svn: 19651
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
emitted too early. In particular, this fixes
Regression/CodeGen/X86/regpressure.ll:regpressure3.
This also improves the 2nd basic block in 164.gzip:flush_block, which went from
.LBBflush_block_1: # loopentry.1.i
movzx %EAX, WORD PTR [dyn_ltree + 20]
movzx %ECX, WORD PTR [dyn_ltree + 16]
mov DWORD PTR [%ESP + 32], %ECX
movzx %ECX, WORD PTR [dyn_ltree + 12]
movzx %EDX, WORD PTR [dyn_ltree + 8]
movzx %EBX, WORD PTR [dyn_ltree + 4]
mov DWORD PTR [%ESP + 36], %EBX
movzx %EBX, WORD PTR [dyn_ltree]
add DWORD PTR [%ESP + 36], %EBX
add %EDX, DWORD PTR [%ESP + 36]
add %ECX, %EDX
add DWORD PTR [%ESP + 32], %ECX
add %EAX, DWORD PTR [%ESP + 32]
movzx %ECX, WORD PTR [dyn_ltree + 24]
add %EAX, %ECX
mov %ECX, 0
mov %EDX, %ECX
to
.LBBflush_block_1: # loopentry.1.i
movzx %EAX, WORD PTR [dyn_ltree]
movzx %ECX, WORD PTR [dyn_ltree + 4]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 8]
add %EAX, %ECX
movzx %ECX, WORD PTR [dyn_ltree + 12]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 16]
add %EAX, %ECX
movzx %ECX, WORD PTR [dyn_ltree + 20]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 24]
add %ECX, %EAX
mov %EAX, 0
mov %EDX, %EAX
... which results in less spilling in the function.
This change alone speeds up 164.gzip from 37.23s to 36.24s on apoc. The
default isel takes 37.31s.
llvm-svn: 19650
|
| |
|
|
| |
llvm-svn: 19649
|
| |
|
|
|
|
|
| |
before other ops, causing it to spill like mad. This occurs in
164.gzip:flush_block.
llvm-svn: 19648
|
| |
|
|
| |
llvm-svn: 19647
|
| |
|
|
| |
llvm-svn: 19645
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86/reg-pressure.ll again, and allows us to do nice things in other cases.
For example, we now codegen this sort of thing:
int %loadload(int *%X, int* %Y) {
%Z = load int* %Y
%Y = load int* %X ;; load between %Z and store
%Q = add int %Z, 1
store int %Q, int* %Y
ret int %Y
}
Into this:
loadload:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%EAX]
mov %ECX, DWORD PTR [%ESP + 8]
inc DWORD PTR [%ECX]
ret
where we weren't able to form the 'inc [mem]' before. This also lets the
instruction selector emit loads in any order it wants to, which can be good
for register pressure as well.
llvm-svn: 19644
|
| |
|
|
|
|
|
| |
1. Fold [mem] += (1|-1) into inc [mem]/dec [mem] to save some icache space.
2. Do not let token factor nodes prevent forming '[mem] op= val' folds.
llvm-svn: 19643
|
| |
|
|
| |
llvm-svn: 19642
|
| |
|
|
| |
llvm-svn: 19641
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the basic block that uses them if possible. This is a big win on X86, as it
lets us fold the argument loads into instructions and reduce register pressure
(by not loading all of the arguments in the entry block).
For this (contrived to show the optimization) testcase:
int %argtest(int %A, int %B) {
%X = sub int 12345, %A
br label %L
L:
%Y = add int %X, %B
ret int %Y
}
we used to produce:
argtest:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, 12345
sub %EAX, %ECX
mov %EDX, DWORD PTR [%ESP + 8]
.LBBargtest_1: # L
add %EAX, %EDX
ret
now we produce:
argtest:
mov %EAX, 12345
sub %EAX, DWORD PTR [%ESP + 4]
.LBBargtest_1: # L
add %EAX, DWORD PTR [%ESP + 8]
ret
This also fixes the FIXME in the code.
BTW, this occurs in real code. 164.gzip shrinks from 8623 to 8608 lines of
.s file. The stack frame in huft_build shrinks from 1644->1628 bytes,
inflate_codes shrinks from 116->108 bytes, and inflate_block from 2620->2612,
due to fewer spills.
Take that alkis. :-)
llvm-svn: 19639
|
| |
|
|
|
|
|
|
| |
operations.
The body of the if is less indented but unmodified in this patch.
llvm-svn: 19638
|
| |
|
|
| |
llvm-svn: 19635
|
| |
|
|
| |
llvm-svn: 19634
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
int %foo(int %X) {
%T = add int %X, 13
%S = mul int %T, 3
ret int %S
}
as this:
mov %ECX, DWORD PTR [%ESP + 4]
lea %EAX, DWORD PTR [%ECX + 2*%ECX + 39]
ret
instead of this:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, %ECX
add %EAX, 13
imul %EAX, %EAX, 3
ret
llvm-svn: 19633
|
| |
|
|
| |
llvm-svn: 19632
|
| |
|
|
|
|
|
|
| |
Do not fold a load into an operation if it will induce a cycle in the DAG.
Repeat after me: dAg.
llvm-svn: 19631
|
| |
|
|
| |
llvm-svn: 19630
|
| |
|
|
|
|
| |
useness.
llvm-svn: 19629
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Disable the xform for < > cases. It turns out that the following is being
miscompiled:
bool %test(sbyte %S) {
%T = cast sbyte %S to uint
%V = setgt uint %T, 255
ret bool %V
}
llvm-svn: 19628
|
| |
|
|
| |
llvm-svn: 19627
|
| |
|
|
|
|
|
| |
The comparison will probably be folded, so this is not ok to do.
This fixed 197.parser.
llvm-svn: 19624
|
| |
|
|
| |
llvm-svn: 19623
|
| |
|
|
|
|
| |
of the bytereg. This fixes yacr2, 300.twolf and probably others.
llvm-svn: 19622
|
| |
|
|
| |
llvm-svn: 19621
|
| |
|
|
|
|
|
| |
If we emit a load because we followed a token chain to get to it, try to
fold it into its single user if possible.
llvm-svn: 19620
|
| |
|
|
| |
llvm-svn: 19619
|
| |
|
|
|
|
| |
Add fields to hold the result type of setcc operations and shift amounts.
llvm-svn: 19618
|
| |
|
|
| |
llvm-svn: 19617
|
| |
|
|
| |
llvm-svn: 19616
|
| |
|
|
|
|
| |
functions to preserve SSA
llvm-svn: 19615
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* Remove custom promotion for bool and byte select ops. Legalize now
promotes them for us.
* Allow folding ConstantPoolIndexes into EXTLOAD's, useful for float immediates.
* Declare which operations are not supported better.
* Add some hacky code for TRUNCSTORE to pretend that we have truncstore
for i16 types. This is useful for testing promotion code because I can
just remove 16-bit registers all together and verify that programs work.
llvm-svn: 19614
|
| |
|
|
|
|
|
|
|
|
| |
track of how to deal with it, and provide the target with a hook that they
can use to legalize arbitrary operations in arbitrary ways.
Implement custom lowering for a couple of ops, implement promotion for select
operations (which x86 needs).
llvm-svn: 19613
|
| |
|
|
| |
llvm-svn: 19612
|