| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allows us to compile:
store float 10.0, float* %P
into:
mov DWORD PTR [%EAX], 1092616192
instead of:
.CPItest_0: # float 0x4024000000000000
.long 1092616192 # float 10
...
fld DWORD PTR [.CPItest_0]
fstp DWORD PTR [%EAX]
llvm-svn: 13409
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
against zero. In particular, don't emit:
mov %ESI, 0
cmp %ECX, %ESI
instead, emit:
test %ECX, %ECX
llvm-svn: 13407
|
| |
|
|
| |
llvm-svn: 13355
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
div:
mov %EDX, DWORD PTR [%ESP + 4]
mov %ECX, 64
mov %EAX, %EDX
sar %EDX, 31
idiv %ECX
ret
to this:
div:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
sar %ECX, 5
shr %ECX, 26
mov %EDX, %EAX
add %EDX, %ECX
sar %EAX, 6
ret
Note that the intel compiler is currently making this:
div:
movl 4(%esp), %edx #3.5
movl %edx, %eax #4.14
sarl $5, %eax #4.14
shrl $26, %eax #4.14
addl %edx, %eax #4.14
sarl $6, %eax #4.14
ret #4.14
Which has one less register->register copy. (hint hint alkis :)
llvm-svn: 13354
|
| |
|
|
| |
llvm-svn: 13342
|
| |
|
|
| |
llvm-svn: 13304
|
| |
|
|
|
|
| |
Look at all of the pretty minuses. :)
llvm-svn: 13303
|
| |
|
|
|
|
|
| |
In InsertFPRegKills(), just check the MachineBasicBlock for successors
instead of its corresponding BasicBlock.
llvm-svn: 13213
|
| |
|
|
|
|
| |
LLVM CFG when trying to find the successors of BB.
llvm-svn: 13212
|
| |
|
|
| |
llvm-svn: 13211
|
| |
|
|
| |
llvm-svn: 13120
|
| |
|
|
|
|
|
| |
The iterator is pointing at the next instruction which should not disappear
when doing the load/store replacement.
llvm-svn: 12954
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
even when the "optimization" I added before is turned off. It generates this
extremely pointless code:
test:
fld QWORD PTR [%ESP + 4]
mov %AL, 0
test %AL, %AL
fcmove %ST(0), %ST(0)
ret
Good thing the optimizer will have removed this before code generation
anyway. :)
llvm-svn: 12939
|
| |
|
|
|
|
|
| |
On x86, memory operations occur in-order, so these are just lowered into
volatile loads and stores.
llvm-svn: 12936
|
| |
|
|
|
|
|
|
| |
X86/2004-04-13-FPCMOV-Crash.llx
A more robust fix is to follow.
llvm-svn: 12935
|
| |
|
|
|
|
|
|
|
|
|
| |
Fix several bugs in the intrinsics:
1. Make sure to copy the input registers before the instructions that use them
2. Make sure to copy the value returned by 'in' out of EAX into the register
it is supposed to be in.
This fixes assertions when using in/out and linear scan.
llvm-svn: 12896
|
| |
|
|
| |
llvm-svn: 12895
|
| |
|
|
| |
llvm-svn: 12894
|
| |
|
|
| |
llvm-svn: 12893
|
| |
|
|
|
|
| |
implicitly use ST(0)
llvm-svn: 12855
|
| |
|
|
|
|
| |
I have unsaved emacs buffers, geeze...
llvm-svn: 12854
|
| |
|
|
| |
llvm-svn: 12853
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of the fucom[p][p] instructions. This allows us to code generate this function
bool %test(double %X, double %Y) {
%C = setlt double %Y, %X
ret bool %C
}
... into:
test:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [%ESP + 12]
fucomip %ST(1)
fstp %ST(0)
setb %AL
movsx %EAX, %AL
ret
where before we generated:
test:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [%ESP + 12]
fucompp
** fnstsw
** sahf
setb %AL
movsx %EAX, %AL
ret
The two marked instructions (which are the ones eliminated) are very bad,
because they serialize execution of the processor. These instructions are
available on the PPRO and later, but since we already use cmov's we aren't
losing any portability.
I retained the old code for the day when we decide we want to support back
to the 386.
llvm-svn: 12852
|
| |
|
|
| |
llvm-svn: 12851
|
| |
|
|
| |
llvm-svn: 12850
|
| |
|
|
| |
llvm-svn: 12849
|
| |
|
|
| |
llvm-svn: 12848
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the source of the cast is a load, we can just use the source memory location,
without having to create a temporary stack slot entry.
Before we code generated this:
double %int(int* %P) {
%V = load int* %P
%V2 = cast int %V to double
ret double %V2
}
into:
int:
sub %ESP, 4
mov %EAX, DWORD PTR [%ESP + 8]
mov %EAX, DWORD PTR [%EAX]
mov DWORD PTR [%ESP], %EAX
fild DWORD PTR [%ESP]
add %ESP, 4
ret
Now we produce this:
int:
mov %EAX, DWORD PTR [%ESP + 4]
fild DWORD PTR [%EAX]
ret
... which is nicer.
llvm-svn: 12846
|
| |
|
|
|
|
| |
test/Regression/CodeGen/X86/fp_load_fold.llx
llvm-svn: 12844
|
| |
|
|
| |
llvm-svn: 12842
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for mul and div.
Instead of generating this:
test_divr:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [.CPItest_divr_0]
fdivrp %ST(1)
ret
We now generate this:
test_divr:
fld QWORD PTR [%ESP + 4]
fdivr QWORD PTR [.CPItest_divr_0]
ret
This code desperately needs refactoring, which will come in the next
patch.
llvm-svn: 12841
|
| |
|
|
|
|
|
| |
instructions use. This doesn't change any functionality except that long
constant expressions of these operations will now magically start working.
llvm-svn: 12840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fld QWORD PTR [%ESP + 4]
fadd QWORD PTR [.CPItest_add_0]
instead of:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [.CPItest_add_0]
faddp %ST(1)
I also intend to do this for mul & div, but it appears that I have to
refactor a bit of code before I can do so.
This is tested by: test/Regression/CodeGen/X86/fp_constant_op.llx
llvm-svn: 12839
|
| |
|
|
| |
llvm-svn: 12838
|
| |
|
|
| |
llvm-svn: 12836
|
| |
|
|
|
|
|
|
|
| |
1. If an incoming argument is dead, don't load it from the stack
2. Do not code gen noop copies at all (ie, cast int -> uint), not even to
a move. This should reduce register pressure for allocators that are
unable to coallesce away these copies in some cases.
llvm-svn: 12835
|
| |
|
|
| |
llvm-svn: 12815
|
| |
|
|
|
|
| |
is listed first and the address is listed second.
llvm-svn: 12795
|
| |
|
|
| |
llvm-svn: 12787
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
InstSelectSimple.cpp:
Change the checks for proper I/O port address size into an exit() instead
of an assertion. Assertions aren't used in Release builds, and handling
this error should be graceful (not that this counts as graceful, but it's
more graceful).
Modified the generation of the IN/OUT instructions to have 0 arguments.
X86InstrInfo.td:
Added the OpSize attribute to the 16 bit IN and OUT instructions.
llvm-svn: 12786
|
| |
|
|
|
|
|
|
|
|
|
|
| |
I/O port instructions on x86. The specific code sequence is tailored to
the parameters and return value of the intrinsic call.
Added the ability for implicit defintions to be printed in the Instruction
Printer.
Added the ability for RawFrm instruction to print implict uses and
defintions with correct comma output. This required adjustment to some
methods so that a leading comma would or would not be printed.
llvm-svn: 12782
|
| |
|
|
|
|
| |
from the really simple X86 instruction selector tablegen backend
llvm-svn: 12715
|
| |
|
|
| |
llvm-svn: 12714
|
| |
|
|
| |
llvm-svn: 12711
|
| |
|
|
| |
llvm-svn: 12710
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable folding of long seteq/setne comparisons into branches and select instructions
Implement unfolded long relational comparisons against a constants a bit more efficiently
Folding comparisons changes code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
sete %CL
test %CL, %CL
je .LBB2 # PC rel: F
into code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
jne .LBB2 # PC rel: F
This speeds up 186.crafty by 6% with llc-ls.
llvm-svn: 12702
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
comparing a long against zero got us this:
sub %ESP, 8
mov DWORD PTR [%ESP + 4], %ESI
mov DWORD PTR [%ESP], %EDI
mov %EAX, DWORD PTR [%ESP + 12]
mov %EDX, DWORD PTR [%ESP + 16]
mov %ECX, 0
mov %ESI, 0
mov %EDI, %EAX
xor %EDI, %ECX
mov %ECX, %EDX
xor %ECX, %ESI
or %EDI, %ECX
sete %CL
test %CL, %CL
je .LBB2 # PC rel: F
Now it gets us this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
sete %CL
test %CL, %CL
je .LBB2 # PC rel: F
llvm-svn: 12696
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
immediate. For
example, multiplying X*(1 + (1LL << 32)) now produces:
test:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %EAX, %ECX
add %EDX, %ECX
ret
[[[Note to Alkis: why isn't linear scan generating this code?? This might be a
problem with your intervals being too conservative:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
add %EDX, %EAX
ret
end note]]]
Whereas GCC produces this:
T:
sub %esp, 12
mov %edx, DWORD PTR [%esp+16]
mov DWORD PTR [%esp+8], %edi
mov %ecx, DWORD PTR [%esp+20]
xor %edi, %edi
mov DWORD PTR [%esp], %ebx
mov %ebx, %edi
mov %eax, %edx
mov DWORD PTR [%esp+4], %esi
add %ebx, %edx
mov %edi, DWORD PTR [%esp+8]
lea %edx, [%ecx+%ebx]
mov %esi, DWORD PTR [%esp+4]
mov %ebx, DWORD PTR [%esp]
add %esp, 12
ret
I'm not sure example what GCC is smoking here, but it looks like it has just
confused itself with a bunch of stack slots or something. The intel compiler
is better, but still not good:
T:
movl 4(%esp), %edx #2.11
movl 8(%esp), %eax #2.11
lea (%eax,%edx), %ecx #3.12
movl $1, %eax #3.12
mull %edx #3.12
addl %ecx, %edx #3.12
ret #3.12
llvm-svn: 12693
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
long %test(long %X) {
%Y = mul long %X, 123
ret long %Y
}
we used to generate:
test:
sub %ESP, 12
mov DWORD PTR [%ESP + 8], %ESI
mov DWORD PTR [%ESP + 4], %EDI
mov DWORD PTR [%ESP], %EBX
mov %ECX, DWORD PTR [%ESP + 16]
mov %ESI, DWORD PTR [%ESP + 20]
mov %EDI, 123
mov %EBX, 0
mov %EAX, %ECX
mul %EDI
imul %ESI, %EDI
add %ESI, %EDX
imul %ECX, %EBX
add %ESI, %ECX
mov %EDX, %ESI
mov %EBX, DWORD PTR [%ESP]
mov %EDI, DWORD PTR [%ESP + 4]
mov %ESI, DWORD PTR [%ESP + 8]
add %ESP, 12
ret
Now we emit:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
mov %EDX, 123
mul %EDX
imul %ECX, %ECX, 123
add %ECX, %EDX
mov %EDX, %ECX
ret
Which, incidently, is substantially nicer than what GCC manages:
T:
sub %esp, 8
mov %eax, 123
mov DWORD PTR [%esp], %ebx
mov %ebx, DWORD PTR [%esp+16]
mov DWORD PTR [%esp+4], %esi
mov %esi, DWORD PTR [%esp+12]
imul %ecx, %ebx, 123
mov %ebx, DWORD PTR [%esp]
mul %esi
mov %esi, DWORD PTR [%esp+4]
add %esp, 8
lea %edx, [%ecx+%edx]
ret
llvm-svn: 12692
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On this testcase:
long %test(long %X) {
%Y = shr long %X, ubyte 32
ret long %Y
}
instead of:
t:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%ESP + 8]
sar %EAX, 0
mov %EDX, 0
ret
we now emit:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%ESP + 8]
mov %EDX, 0
ret
llvm-svn: 12688
|