| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Fix an illegal argument to getClassB when deciding whether or not to
sign extend a byte load.
2. Initial addition of isLoad and isStore flags to the instruction .td file
for eventual use in a scheduler.
3. Rewrite of how constants are handled in emitSimpleBinaryOperation so
that we can emit the PowerPC shifted immediate instructions far more
often. This allows us to emit the following code:
int foo(int x) { return x | 0x00F0000; }
_foo:
.LBB_foo_0: ; entry
; IMPLICIT_DEF
oris r3, r3, 15
blr
llvm-svn: 16826
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
loading a 32bit constant into a register whose low halfword is all zeroes.
We now omit the ori after the lis for the following C code:
int bar(int y) { return y * 0x00F0000; }
_bar:
.LBB_bar_0: ; entry
; IMPLICIT_DEF
lis r2, 15
mullw r3, r3, r2
blr
llvm-svn: 16825
|
| |
|
|
| |
llvm-svn: 16824
|
| |
|
|
| |
llvm-svn: 16814
|
| |
|
|
|
|
|
| |
a map. This caused problems if a later object happened to be allocated at
the free'd object's address.
llvm-svn: 16813
|
| |
|
|
|
|
|
|
|
|
| |
exponential behavior (bork!). This patch processes stuff with an
explicit SCC finder, allowing the algorithm to be more clear,
efficient, and also (as a bonus) correct! This gets us back to taking
0.6s to disassemble my horrible .bc file that previously took something
> 30 mins.
llvm-svn: 16811
|
| |
|
|
|
|
| |
speedup, but has the advantage of not breaking a bunch of programs!
llvm-svn: 16806
|
| |
|
|
| |
llvm-svn: 16804
|
| |
|
|
| |
llvm-svn: 16803
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Instead of handling dead functions specially, just nuke them.
* Be more aggressive about cleaning up after constification, in
particular, handle getelementptr instructions and constantexprs.
* Be a little bit more structured about how we process globals.
*** Delete globals that are only stored to, and never read. These are
clearly not useful, so they should go. This implements deadglobal.llx
This last one triggers quite a few times. In particular, 2208 in the
external tests, 1865 of which are in 252.eon. This shrinks eon from
1995094 to 1732341 bytes of bytecode.
llvm-svn: 16802
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
simplifications of the resultant program to avoid making later passes
do it all.
This allows us to constify globals that just have the same constant that
they are initialized stored into them.
Suprisingly this comes up ALL of the freaking time, dozens of times in
SPEC, 30 times in vortex alone.
For example, on 256.bzip2, it allows us to constify these two globals:
%smallMode = internal global ubyte 0 ; <ubyte*> [#uses=8]
%verbosity = internal global int 0 ; <int*> [#uses=49]
Which (with later optimizations) results in the bytecode file shrinking
from 82286 to 69686 bytes! Lets hear it for IPO :)
For the record, it's nuking lots of "if (verbosity > 2) { do lots of stuff }"
code.
llvm-svn: 16793
|
| |
|
|
| |
llvm-svn: 16779
|
| |
|
|
|
|
|
|
|
|
|
|
| |
(PromoteAbstractToConcrete), and to use a set to avoid recomputation.
In particular, this set eliminates the potentially exponential cases
from this little recursive algorithm.
On a particularly nasty testcase, llvm-dis on the .bc file went from 34
minutes (which is when I killed it, it still hadn't finished) to 0.57s.
Remember kids, exponential algorithms are bad.
llvm-svn: 16772
|
| |
|
|
| |
llvm-svn: 16770
|
| |
|
|
| |
llvm-svn: 16769
|
| |
|
|
|
|
| |
the JIT had last night.
llvm-svn: 16766
|
| |
|
|
| |
llvm-svn: 16765
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instruction.
Now, rather than emitting the following loop out of bisect:
.LBB_main_19: ; no_exit.0.i
rlwinm r3, r2, 3, 0, 28
lfdx f1, r3, r27
addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
lfd f2, lo16(.CPI_main_1-"L00000$pb")(r3)
fsub f2, f2, f1
addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
fcmpu cr0, f1, f4
bge .LBB_main_64 ; no_exit.0.i
.LBB_main_63: ; no_exit.0.i
b .LBB_main_65 ; no_exit.0.i
.LBB_main_64: ; no_exit.0.i
fmr f2, f1
.LBB_main_65: ; no_exit.0.i
addi r3, r2, 1
rlwinm r3, r3, 3, 0, 28
lfdx f1, r3, r27
addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
fsub f4, f4, f1
addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
lfd f5, lo16(.CPI_main_1-"L00000$pb")(r3)
fcmpu cr0, f1, f5
bge .LBB_main_67 ; no_exit.0.i
.LBB_main_66: ; no_exit.0.i
b .LBB_main_68 ; no_exit.0.i
.LBB_main_67: ; no_exit.0.i
fmr f4, f1
.LBB_main_68: ; no_exit.0.i
fadd f1, f2, f4
addis r3, r30, ha16(.CPI_main_2-"L00000$pb")
lfd f2, lo16(.CPI_main_2-"L00000$pb")(r3)
fmul f1, f1, f2
rlwinm r3, r2, 3, 0, 28
lfdx f2, r3, r28
fadd f4, f2, f1
fcmpu cr0, f4, f0
bgt .LBB_main_70 ; no_exit.0.i
.LBB_main_69: ; no_exit.0.i
b .LBB_main_71 ; no_exit.0.i
.LBB_main_70: ; no_exit.0.i
fmr f0, f4
.LBB_main_71: ; no_exit.0.i
fsub f1, f2, f1
addi r2, r2, -1
fcmpu cr0, f1, f3
blt .LBB_main_73 ; no_exit.0.i
.LBB_main_72: ; no_exit.0.i
b .LBB_main_74 ; no_exit.0.i
.LBB_main_73: ; no_exit.0.i
fmr f3, f1
.LBB_main_74: ; no_exit.0.i
cmpwi cr0, r2, -1
fmr f16, f0
fmr f17, f3
bgt .LBB_main_19 ; no_exit.0.i
We emit this instead:
.LBB_main_19: ; no_exit.0.i
rlwinm r3, r2, 3, 0, 28
lfdx f1, r3, r27
addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
lfd f2, lo16(.CPI_main_1-"L00000$pb")(r3)
fsub f2, f2, f1
fsel f1, f1, f1, f2
addi r3, r2, 1
rlwinm r3, r3, 3, 0, 28
lfdx f2, r3, r27
addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
fsub f4, f4, f2
fsel f2, f2, f2, f4
fadd f1, f1, f2
addis r3, r30, ha16(.CPI_main_2-"L00000$pb")
lfd f2, lo16(.CPI_main_2-"L00000$pb")(r3)
fmul f1, f1, f2
rlwinm r3, r2, 3, 0, 28
lfdx f2, r3, r28
fadd f4, f2, f1
fsub f5, f0, f4
fsel f0, f5, f0, f4
fsub f1, f2, f1
addi r2, r2, -1
fsub f2, f1, f3
fsel f3, f2, f3, f1
cmpwi cr0, r2, -1
fmr f16, f0
fmr f17, f3
bgt .LBB_main_19 ; no_exit.0.i
llvm-svn: 16764
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
t:
mov %EDX, DWORD PTR [%ESP + 4]
mov %ECX, 2
mov %EAX, %EDX
sar %EDX, 31
idiv %ECX
mov %EAX, %EDX
ret
Generate:
t:
mov %ECX, DWORD PTR [%ESP + 4]
*** mov %EAX, %ECX
cdq
and %ECX, 1
xor %ECX, %EDX
sub %ECX, %EDX
*** mov %EAX, %ECX
ret
Note that the two marked moves are redundant, and should be eliminated by the
register allocator, but aren't.
Compare this to GCC, which generates:
t:
mov %eax, DWORD PTR [%esp+4]
mov %edx, %eax
shr %edx, 31
lea %ecx, [%edx+%eax]
and %ecx, -2
sub %eax, %ecx
ret
or ICC 8.0, which generates:
t:
movl 4(%esp), %ecx #3.5
movl $-2147483647, %eax #3.25
imull %ecx #3.25
movl %ecx, %eax #3.25
sarl $31, %eax #3.25
addl %ecx, %edx #3.25
subl %edx, %eax #3.25
addl %eax, %eax #3.25
negl %eax #3.25
subl %eax, %ecx #3.25
movl %ecx, %eax #3.25
ret #3.25
We would be in great shape if not for the moves.
llvm-svn: 16763
|
| |
|
|
|
|
| |
Thanks to Jeff Cohen for pointing out my goof.
llvm-svn: 16762
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
s: ;; X / 4
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
sar %ECX, 1
shr %ECX, 30
mov %EDX, %EAX
add %EDX, %ECX
sar %EAX, 2
ret
When we really meant:
s:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
sar %ECX, 1
shr %ECX, 30
add %EAX, %ECX
sar %EAX, 2
ret
Hey, this also reduces register pressure too :)
llvm-svn: 16761
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of:
s: ;; X / 2
movl 4(%esp), %eax
movl %eax, %ecx
shrl $31, %ecx
movl %eax, %edx
addl %ecx, %edx
sarl $1, %eax
ret
t: ;; X / -2
movl 4(%esp), %eax
movl %eax, %ecx
shrl $31, %ecx
movl %eax, %edx
addl %ecx, %edx
sarl $1, %eax
negl %eax
ret
Emit:
s:
movl 4(%esp), %eax
cmpl $-2147483648, %eax
sbbl $-1, %eax
sarl $1, %eax
ret
t:
movl 4(%esp), %eax
cmpl $-2147483648, %eax
sbbl $-1, %eax
sarl $1, %eax
negl %eax
ret
llvm-svn: 16760
|
| |
|
|
| |
llvm-svn: 16759
|
| |
|
|
|
|
|
| |
an instruction if it can be hoisted to a common dominator of the block.
This implements: test/Regression/Transforms/TailDup/MergeTest.ll
llvm-svn: 16758
|
| |
|
|
| |
llvm-svn: 16756
|
| |
|
|
| |
llvm-svn: 16728
|
| |
|
|
|
|
|
| |
of disagreeing constness. This fixes
test/Regression/Linker/ConstantGlobals[123].ll
llvm-svn: 16692
|
| |
|
|
| |
llvm-svn: 16686
|
| |
|
|
| |
llvm-svn: 16685
|
| |
|
|
| |
llvm-svn: 16682
|
| |
|
|
|
|
|
|
| |
previously temporary NULLCOMP implementation that merely copies the data
verbatim without compression. Also, don't warn if there's no compression
library as that is taken care of during configuration time.
llvm-svn: 16654
|
| |
|
|
|
|
| |
distinguished. Tidy up documentation. Thanks, Chris.
llvm-svn: 16652
|
| |
|
|
| |
llvm-svn: 16650
|
| |
|
|
|
|
|
|
| |
mapping of files. This first version uses mmap where its available. The
class needs to implement an alternate mechanism based on malloc'd memory
and file reading/writing for platforms without virtual memory.
llvm-svn: 16649
|
| |
|
|
|
|
| |
LLVM that handles availability and unavailability of bzip2 and zlib.
llvm-svn: 16648
|
| |
|
|
|
|
|
|
| |
* Update comments
* Rearrange code a bit
* Finally ELIMINATE the GAS workaround emitter for Intel mode. woot!
llvm-svn: 16647
|
| |
|
|
|
|
| |
may now choose their output format with the -x86-asm-syntax={intel|att} flag.
llvm-svn: 16646
|
| |
|
|
| |
llvm-svn: 16645
|
| |
|
|
|
|
|
|
|
| |
old and broken AT&T syntax assemblers. The problem with this hack is that
*SOME* forms of the fdiv and fsub instructions have the 'r' bit inverted.
This was a real pain to figure out, but is trivially easy to support: thus
we are now bug compatible with gas and gcc.
llvm-svn: 16644
|
| |
|
|
| |
llvm-svn: 16642
|
| |
|
|
| |
llvm-svn: 16641
|
| |
|
|
| |
llvm-svn: 16640
|
| |
|
|
|
|
|
|
| |
Intel and AT&T style assembly language. The ultimate goal of this is to
eliminate the GasBugWorkaroundEmitter class, but for now AT&T style emission
is not fully operational.
llvm-svn: 16639
|
| |
|
|
|
|
|
|
|
| |
hopefully lead to the death of the 'GasBugWorkaroundEmitter'. This also
includes changes to wrap the whole file to 80 columns! Woot! :)
Note that the AT&T style output has not been tested at all.
llvm-svn: 16638
|
| |
|
|
| |
llvm-svn: 16635
|
| |
|
|
| |
llvm-svn: 16633
|
| |
|
|
|
|
|
|
| |
it was a use, def, or both. This allows us to be less pessimistic in our
analysis of them. In practice, this doesn't make a big difference, but it
doesn't hurt either.
llvm-svn: 16632
|
| |
|
|
|
|
|
|
|
|
| |
and delete them if they turn out to be dead. This is a useful little hack
that even speeds up some programs. For example, it speeds up Ptrdist/ks
from 17.53s to 15.59s, and 188.ammp from 149s to 146s.
This also speeds up llc :)
llvm-svn: 16630
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generated code over the simple spiller. The new local spiller generates
substantially better code than the simple one in some cases, by reusing
values that are loaded out of stack slots and kept available in registers.
This primarily helps programs that are spilling a lot, and there is still
stuff that can be done to improve it. This patch makes the local spiller
the default, as it's only a tiny bit slower than the simple spiller (it
increases the runtime of llc by < 1%).
Here are some numbers with speedups.
Program #reuse old(s) new(s) Speedup
Povray: 3452, 16.87 -> 15.93 (5.5%)
177.mesa: 2176, 2.77 -> 2.76 (0%)
179.art: 35, 28.43 -> 28.01 (1.5%)
183.equake: 55, 61.44 -> 61.41 (0%)
188.ammp: 869, 174 -> 149 (15%)
164.gzip: 43, 40.73 -> 40.71 (0%)
175.vpr: 351, 18.54 -> 17.34 (6.5%)
176.gcc: 2471, 5.01 -> 4.92 (1.8%)
181.mcf 42, 79.30 -> 75.20 (5.2%)
186.crafty: 484, 29.73 -> 30.04 (-1%)
197.parser: 251, 10.47 -> 10.67 (-1%)
252.eon: 1501, 1.98 -> 1.75 (12%)
253.perlbm: 1183, 14.83 -> 14.42 (2.8%)
254.gap: 825, 7.46 -> 7.29 (2.3%)
255.vortex: 285, 10.51 -> 10.27 (2.3%)
256.bzip2: 63, 55.70 -> 55.20 (0.9%)
300.twolf: 830, 21.63 -> 22.00 (-1%)
PtrDist/ks 14, 32.75 -> 17.53 (46.5%)
Olden/tsp 46, 8.71 -> 8.24 (5.4%)
Free/distray 70, 1.09 -> 0.99 (9.2%)
llvm-svn: 16629
|
| |
|
|
| |
llvm-svn: 16628
|