| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 14841
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this LLVM function:
int %foo() {
ret int cast (int** getelementptr (int** null, int 1) to int)
}
into:
foo:
mov %EAX, 0
lea %EAX, DWORD PTR [%EAX + 4]
ret
now we compile it into:
foo:
mov %EAX, 4
ret
This sequence is frequently generated by the MSIL front-end, and soon the malloc lowering pass and
Java front-ends as well..
-Chris
llvm-svn: 14834
|
| |
|
|
|
|
|
| |
float as a truncation by going through memory. This truncation was being
skipped, which caused 175.vpr to fail after aggressive register promotion.
llvm-svn: 14473
|
| |
|
|
| |
llvm-svn: 14266
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
mov REG, C
sub REG, X
generate:
neg X
add X, C
which uses one less reg
llvm-svn: 14213
|
| |
|
|
|
|
| |
the setcc.
llvm-svn: 14212
|
| |
|
|
|
|
|
|
|
|
| |
we do not want to fold the load in cases like this:
X = load
= add A, X
= add B, X
llvm-svn: 14204
|
| |
|
|
| |
llvm-svn: 14201
|
| |
|
|
| |
llvm-svn: 14189
|
| |
|
|
|
|
| |
needs to go
llvm-svn: 14185
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
comparisons. In an 'isunordered' predicate, which looks like this at
the LLVM level:
%a = call bool %llvm.isnan(double %X)
%b = call bool %llvm.isnan(double %Y)
%COM = or bool %a, %b
We used to generate this code:
fxch %ST(1)
fucomip %ST(0), %ST(0)
setp %AL
fucomip %ST(0), %ST(0)
setp %AH
or %AL, %AH
With this patch, we generate this code:
fucomip %ST(0), %ST(1)
fstp %ST(0)
setp %AL
Which should make alkis happy. Tested as X86/compare_folding.llx:test1
llvm-svn: 14148
|
| |
|
|
|
|
|
|
|
| |
instructions,
we can get rid of the FpUCOM/FpUCOMi pseudo instructions, which makes stuff simpler
and faster.
llvm-svn: 14144
|
| |
|
|
|
|
|
|
| |
test/Regression/CodeGen/X86/isnan.llx
testcase
llvm-svn: 14141
|
| |
|
|
|
|
| |
that cast to bool.
llvm-svn: 14096
|
| |
|
|
| |
llvm-svn: 13952
|
| |
|
|
| |
llvm-svn: 13695
|
| |
|
|
|
|
| |
LLVM BasicBlock operands.
llvm-svn: 13566
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and passing a null pointer into a function.
For this testcase:
void %test(int** %X) {
store int* null, int** %X
call void %test(int** null)
ret void
}
we now generate this:
test:
sub %ESP, 12
mov %EAX, DWORD PTR [%ESP + 16]
mov DWORD PTR [%EAX], 0
mov DWORD PTR [%ESP], 0
call test
add %ESP, 12
ret
instead of this:
test:
sub %ESP, 12
mov %EAX, DWORD PTR [%ESP + 16]
mov %ECX, 0
mov DWORD PTR [%EAX], %ECX
mov %EAX, 0
mov DWORD PTR [%ESP], %EAX
call test
add %ESP, 12
ret
llvm-svn: 13558
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the alloca address into common operations like loads/stores.
In a simple testcase like this (which is just designed to excersize the
alloca A, nothing more):
int %test(int %X, bool %C) {
%A = alloca int
store int %X, int* %A
store int* %A, int** %G
br bool %C, label %T, label %F
T:
call int %test(int 1, bool false)
%V = load int* %A
ret int %V
F:
call int %test(int 123, bool true)
%V2 = load int* %A
ret int %V2
}
We now generate:
test:
sub %ESP, 12
mov %EAX, DWORD PTR [%ESP + 16]
mov %CL, BYTE PTR [%ESP + 20]
*** mov DWORD PTR [%ESP + 8], %EAX
mov %EAX, OFFSET G
lea %EDX, DWORD PTR [%ESP + 8]
mov DWORD PTR [%EAX], %EDX
test %CL, %CL
je .LBB2 # PC rel: F
.LBB1: # T
mov DWORD PTR [%ESP], 1
mov DWORD PTR [%ESP + 4], 0
call test
*** mov %EAX, DWORD PTR [%ESP + 8]
add %ESP, 12
ret
.LBB2: # F
mov DWORD PTR [%ESP], 123
mov DWORD PTR [%ESP + 4], 1
call test
*** mov %EAX, DWORD PTR [%ESP + 8]
add %ESP, 12
ret
Instead of:
test:
sub %ESP, 20
mov %EAX, DWORD PTR [%ESP + 24]
mov %CL, BYTE PTR [%ESP + 28]
*** lea %EDX, DWORD PTR [%ESP + 16]
*** mov DWORD PTR [%EDX], %EAX
mov %EAX, OFFSET G
mov DWORD PTR [%EAX], %EDX
test %CL, %CL
*** mov DWORD PTR [%ESP + 12], %EDX
je .LBB2 # PC rel: F
.LBB1: # T
mov DWORD PTR [%ESP], 1
mov %EAX, 0
mov DWORD PTR [%ESP + 4], %EAX
call test
*** mov %EAX, DWORD PTR [%ESP + 12]
*** mov %EAX, DWORD PTR [%EAX]
add %ESP, 20
ret
.LBB2: # F
mov DWORD PTR [%ESP], 123
mov %EAX, 1
mov DWORD PTR [%ESP + 4], %EAX
call test
*** mov %EAX, DWORD PTR [%ESP + 12]
*** mov %EAX, DWORD PTR [%EAX]
add %ESP, 20
ret
llvm-svn: 13557
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sized allocas in the entry block). Instead of generating code like this:
entry:
reg1024 = ESP+1234
... (much later)
*reg1024 = 17
Generate code that looks like this:
entry:
(no code generated)
... (much later)
t = ESP+1234
*t = 17
The advantage being that we DRAMATICALLY reduce the register pressure for these
silly temporaries (they were all being spilled to the stack, resulting in very
silly code). This is actually a manual implementation of rematerialization :)
I have a patch to fold the alloca address computation into loads & stores, which
will make this much better still, but just getting this right took way too much time
and I'm sleepy.
llvm-svn: 13554
|
| |
|
|
|
|
|
|
|
|
|
| |
mov DWORD PTR [%ESP + 4], 1
instead of:
mov %EAX, 1
mov DWORD PTR [%ESP + 4], %EAX
llvm-svn: 13494
|
| |
|
|
|
|
|
| |
compiling things like 'add long %X, 1'. The problem is that we were switching
the order of the operands for longs even though we can't fold them yet.
llvm-svn: 13451
|
| |
|
|
| |
llvm-svn: 13440
|
| |
|
|
|
|
| |
extension required)
llvm-svn: 13439
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allows us to compile:
store float 10.0, float* %P
into:
mov DWORD PTR [%EAX], 1092616192
instead of:
.CPItest_0: # float 0x4024000000000000
.long 1092616192 # float 10
...
fld DWORD PTR [.CPItest_0]
fstp DWORD PTR [%EAX]
llvm-svn: 13409
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
against zero. In particular, don't emit:
mov %ESI, 0
cmp %ECX, %ESI
instead, emit:
test %ECX, %ECX
llvm-svn: 13407
|
| |
|
|
| |
llvm-svn: 13355
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
div:
mov %EDX, DWORD PTR [%ESP + 4]
mov %ECX, 64
mov %EAX, %EDX
sar %EDX, 31
idiv %ECX
ret
to this:
div:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
sar %ECX, 5
shr %ECX, 26
mov %EDX, %EAX
add %EDX, %ECX
sar %EAX, 6
ret
Note that the intel compiler is currently making this:
div:
movl 4(%esp), %edx #3.5
movl %edx, %eax #4.14
sarl $5, %eax #4.14
shrl $26, %eax #4.14
addl %edx, %eax #4.14
sarl $6, %eax #4.14
ret #4.14
Which has one less register->register copy. (hint hint alkis :)
llvm-svn: 13354
|
| |
|
|
| |
llvm-svn: 13342
|
| |
|
|
| |
llvm-svn: 13304
|
| |
|
|
|
|
|
| |
In InsertFPRegKills(), just check the MachineBasicBlock for successors
instead of its corresponding BasicBlock.
llvm-svn: 13213
|
| |
|
|
|
|
| |
LLVM CFG when trying to find the successors of BB.
llvm-svn: 13212
|
| |
|
|
| |
llvm-svn: 13211
|
| |
|
|
|
|
|
| |
The iterator is pointing at the next instruction which should not disappear
when doing the load/store replacement.
llvm-svn: 12954
|
| |
|
|
|
|
|
| |
On x86, memory operations occur in-order, so these are just lowered into
volatile loads and stores.
llvm-svn: 12936
|
| |
|
|
|
|
|
|
| |
X86/2004-04-13-FPCMOV-Crash.llx
A more robust fix is to follow.
llvm-svn: 12935
|
| |
|
|
|
|
|
|
|
|
|
| |
Fix several bugs in the intrinsics:
1. Make sure to copy the input registers before the instructions that use them
2. Make sure to copy the value returned by 'in' out of EAX into the register
it is supposed to be in.
This fixes assertions when using in/out and linear scan.
llvm-svn: 12896
|
| |
|
|
|
|
| |
implicitly use ST(0)
llvm-svn: 12855
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of the fucom[p][p] instructions. This allows us to code generate this function
bool %test(double %X, double %Y) {
%C = setlt double %Y, %X
ret bool %C
}
... into:
test:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [%ESP + 12]
fucomip %ST(1)
fstp %ST(0)
setb %AL
movsx %EAX, %AL
ret
where before we generated:
test:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [%ESP + 12]
fucompp
** fnstsw
** sahf
setb %AL
movsx %EAX, %AL
ret
The two marked instructions (which are the ones eliminated) are very bad,
because they serialize execution of the processor. These instructions are
available on the PPRO and later, but since we already use cmov's we aren't
losing any portability.
I retained the old code for the day when we decide we want to support back
to the 386.
llvm-svn: 12852
|
| |
|
|
| |
llvm-svn: 12849
|
| |
|
|
| |
llvm-svn: 12848
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the source of the cast is a load, we can just use the source memory location,
without having to create a temporary stack slot entry.
Before we code generated this:
double %int(int* %P) {
%V = load int* %P
%V2 = cast int %V to double
ret double %V2
}
into:
int:
sub %ESP, 4
mov %EAX, DWORD PTR [%ESP + 8]
mov %EAX, DWORD PTR [%EAX]
mov DWORD PTR [%ESP], %EAX
fild DWORD PTR [%ESP]
add %ESP, 4
ret
Now we produce this:
int:
mov %EAX, DWORD PTR [%ESP + 4]
fild DWORD PTR [%EAX]
ret
... which is nicer.
llvm-svn: 12846
|
| |
|
|
|
|
| |
test/Regression/CodeGen/X86/fp_load_fold.llx
llvm-svn: 12844
|
| |
|
|
| |
llvm-svn: 12842
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for mul and div.
Instead of generating this:
test_divr:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [.CPItest_divr_0]
fdivrp %ST(1)
ret
We now generate this:
test_divr:
fld QWORD PTR [%ESP + 4]
fdivr QWORD PTR [.CPItest_divr_0]
ret
This code desperately needs refactoring, which will come in the next
patch.
llvm-svn: 12841
|
| |
|
|
|
|
|
| |
instructions use. This doesn't change any functionality except that long
constant expressions of these operations will now magically start working.
llvm-svn: 12840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fld QWORD PTR [%ESP + 4]
fadd QWORD PTR [.CPItest_add_0]
instead of:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [.CPItest_add_0]
faddp %ST(1)
I also intend to do this for mul & div, but it appears that I have to
refactor a bit of code before I can do so.
This is tested by: test/Regression/CodeGen/X86/fp_constant_op.llx
llvm-svn: 12839
|
| |
|
|
|
|
|
|
|
| |
1. If an incoming argument is dead, don't load it from the stack
2. Do not code gen noop copies at all (ie, cast int -> uint), not even to
a move. This should reduce register pressure for allocators that are
unable to coallesce away these copies in some cases.
llvm-svn: 12835
|
| |
|
|
| |
llvm-svn: 12815
|
| |
|
|
|
|
| |
is listed first and the address is listed second.
llvm-svn: 12795
|