| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
the order of the bytes in the data stream is flipped around.
llvm-svn: 121215
|
|
|
|
| |
llvm-svn: 121206
|
|
|
|
| |
llvm-svn: 121198
|
|
|
|
|
|
|
|
|
|
|
| |
vpush instructions to save / restore VFP / NEON registers like this:
vpush {d8,d10,d11}
vpop {d8,d10,d11}
vpush and vpop do not allow gaps in the register list.
rdar://8728956
llvm-svn: 121197
|
|
|
|
|
|
| |
functionality change.
llvm-svn: 121195
|
|
|
|
| |
llvm-svn: 121186
|
|
|
|
| |
llvm-svn: 121182
|
|
|
|
|
|
|
| |
possible. They were duplicates for everything exception the source pattern
before.
llvm-svn: 121179
|
|
|
|
| |
llvm-svn: 121176
|
|
|
|
| |
llvm-svn: 121172
|
|
|
|
|
|
| |
conditional moves are directly matched using tablegen patterns. If there's a need in the future, we can introduce it again
llvm-svn: 121164
|
|
|
|
|
|
|
|
| |
(select (load (load tga0)) (load tga1)) => (load (select (load tga0) tga1))
Thanks to Akira for pointing that.
llvm-svn: 121163
|
|
|
|
| |
llvm-svn: 121153
|
|
|
|
| |
llvm-svn: 121142
|
|
|
|
|
|
|
|
| |
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.
llvm-svn: 121120
|
|
|
|
|
|
|
|
| |
PrivateGlobalPrefix as ".L".
Or, global symbols @Lxxxx might be treated as temporal symbol by MCSymbol.
llvm-svn: 121103
|
|
|
|
|
|
| |
gazillion places that need to know about it.
llvm-svn: 121082
|
|
|
|
| |
llvm-svn: 121072
|
|
|
|
|
|
| |
patch contributed by Jack Whitham!
llvm-svn: 121049
|
|
|
|
|
|
|
|
| |
Use BRAD instead of BRD for indirect branches in MBlaze backend.
patch contributed by Jack Whitham!
llvm-svn: 121044
|
|
|
|
|
|
|
|
| |
Address more hazards in the MBlaze delay slot filler.
patch contributed by Jack Whitham!
llvm-svn: 121037
|
|
|
|
|
|
| |
freed data to be read. I will open a bug to track it being reenabled.
llvm-svn: 121028
|
|
|
|
| |
llvm-svn: 121026
|
|
|
|
| |
llvm-svn: 121024
|
|
|
|
|
|
| |
friends) to Pseudos.
llvm-svn: 121021
|
|
|
|
|
|
| |
the instruction is predicated, reg0 otherwise.
llvm-svn: 121020
|
|
|
|
|
|
| |
not an immediate. It stores either ARM::CPSR or reg0.
llvm-svn: 121018
|
|
|
|
|
|
|
| |
as llc + llvm-mc. This time ELF is not changed and I tested that llvm-gcc
bootstrap on darwin10 using darwin9's assembler and linker.
llvm-svn: 121006
|
|
|
|
| |
llvm-svn: 120982
|
|
|
|
| |
llvm-svn: 120971
|
|
|
|
| |
llvm-svn: 120966
|
|
|
|
| |
llvm-svn: 120965
|
|
|
|
| |
llvm-svn: 120964
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
additional pipeline stall. So it's frequently better to single codegen
vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
vmla + vmla is very bad. But this isn't ideal either:
vmul
vadd
vmla
Instead, we want to expand the second vmla:
vmla
vmul
vadd
Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
vmla / vmls will trigger one of the special hazards.
Work in progress, only A+B are enabled.
llvm-svn: 120960
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
result. This allows us to compile:
void *test12(long count) {
return new int[count];
}
into:
test12:
movl $4, %ecx
movq %rdi, %rax
mulq %rcx
movq $-1, %rdi
cmovnoq %rax, %rdi
jmp __Znam ## TAILCALL
instead of:
test12:
movl $4, %ecx
movq %rdi, %rax
mulq %rcx
seto %cl
testb %cl, %cl
movq $-1, %rdi
cmoveq %rax, %rdi
jmp __Znam
Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.
llvm-svn: 120936
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
backend that they were all implemented except umul. This one fell back
to the default implementation that did a hi/lo multiply and compared the
top. Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test. Now we compile:
void *func(long count) {
return new int[count];
}
into:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
seto %cl ## encoding: [0x0f,0x90,0xc1]
testb %cl, %cl ## encoding: [0x84,0xc9]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
instead of:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.
llvm-svn: 120935
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
select, inserting a not to compensate. Add a missing isZero check
that I lost somehow.
This improves codegen of:
void *func(long count) {
return new int[count];
}
from:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
## encoding: [0xeb,A]
to:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
cmpq $1, %rdx ## encoding: [0x48,0x83,0xfa,0x01]
sbbq %rdi, %rdi ## encoding: [0x48,0x19,0xff]
notq %rdi ## encoding: [0x48,0xf7,0xd7]
orq %rax, %rdi ## encoding: [0x48,0x09,0xc7]
jmp __Znam ## TAILCALL
## encoding: [0xeb,A]
llvm-svn: 120932
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. generalize
(select (x == 0), -1, 0) -> (sign_bit (x - 1))
to:
(select (x == 0), -1, y) -> (sign_bit (x - 1)) | y
2. Handle the identical pattern that happens with !=:
(select (x != 0), y, -1) -> (sign_bit (x - 1)) | y
cmov is often high latency and can't fold immediates or
memory operands. For example for (x == 0) ? -1 : 1, before
we got:
< testb %sil, %sil
< movl $-1, %ecx
< movl $1, %eax
< cmovel %ecx, %eax
now we get:
> cmpb $1, %sil
> sbbl %eax, %eax
> orl $1, %eax
llvm-svn: 120929
|
|
|
|
| |
llvm-svn: 120923
|
|
|
|
|
|
|
| |
- Also adds a new POPCNT subtarget feature that is currently enabled if the target
supports SSE4.2 (nehalem) or SSE4A (barcelona).
llvm-svn: 120917
|
|
|
|
| |
llvm-svn: 120907
|
|
|
|
|
|
| |
Check for that and try narrowing it to tADDspi instead. Radar 8724703.
llvm-svn: 120892
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
foo = a - b
.long foo
instead of just
.long a - b
First, on darwin9 64 bits the assembler produces the wrong result. Second,
if "a" is the end of the section all darwin assemblers (9, 10 and mc) will not
consider a - b to be a constant but will if the dummy foo is created.
Split how we handle these cases. The first one is something MC should take care
of. The second one has to be handled by the caller.
llvm-svn: 120889
|
|
|
|
| |
llvm-svn: 120865
|
|
|
|
|
|
|
| |
tCMPzhir has undefined behavior when both source registers are low registers.
rdar://8728577
llvm-svn: 120858
|
|
|
|
| |
llvm-svn: 120857
|
|
|
|
|
|
| |
operand encoding ordering of the instruction.
llvm-svn: 120852
|
|
|
|
|
|
| |
ARM instruction). Add encoding of bits 13 and 11.
llvm-svn: 120849
|
|
|
|
|
|
| |
halfword being emitted to the stream first. rdar://8728174
llvm-svn: 120848
|
|
|
|
|
|
| |
I'm unclear if the tests are actually correct or not, but reverting for now.
llvm-svn: 120847
|