| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 122247
|
| |
|
|
| |
llvm-svn: 122246
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the same as setcc. Optimize ADDC(0,0,FLAGS) -> SET_CARRY(FLAGS). This is
a step towards finishing off PR5443. In the testcase in that bug we now get:
movq %rdi, %rax
addq %rsi, %rax
sbbq %rcx, %rcx
testb $1, %cl
setne %dl
ret
instead of:
movq %rdi, %rax
addq %rsi, %rax
movl $0, %ecx
adcq $0, %rcx
testq %rcx, %rcx
setne %dl
ret
llvm-svn: 122219
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
doesn't, match it back to setb.
On a 64-bit version of the testcase before we'd get:
movq %rdi, %rax
addq %rsi, %rax
sbbb %dl, %dl
andb $1, %dl
ret
now we get:
movq %rdi, %rax
addq %rsi, %rax
setb %dl
ret
llvm-svn: 122217
|
| |
|
|
| |
llvm-svn: 122214
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
their carry depenedencies with MVT::Flag operands) and use clean and beautiful
EFLAGS dependences instead.
We do this by changing the modelling of SBB/ADC to have EFLAGS input and outputs
(which is what requires the previous scheduler change) and change X86 ISelLowering
to custom lower ADDC and friends down to X86ISD::ADD/ADC/SUB/SBB nodes.
With the previous series of changes, this causes no changes in the testsuite, woo.
llvm-svn: 122213
|
| |
|
|
|
|
|
|
| |
after legalize types
has run, e.g., prevent creating an i64 node from a v2i64 when i64 is not a legal type.
llvm-svn: 122206
|
| |
|
|
|
|
|
|
| |
consistently by moving it out of lowering into dag combine.
Add some missing patterns for matching away extended versions of setcc_c.
llvm-svn: 122201
|
| |
|
|
|
|
| |
going through the CSE maps to get it.
llvm-svn: 122196
|
| |
|
|
|
|
| |
we don't need -disable-mmx anymore.
llvm-svn: 122189
|
| |
|
|
| |
llvm-svn: 122187
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generate them.
Now we compile:
define zeroext i8 @X(i8 signext %a, i8 signext %b) nounwind ssp {
entry:
%0 = tail call %0 @llvm.sadd.with.overflow.i8(i8 %a, i8 %b)
%cmp = extractvalue %0 %0, 1
br i1 %cmp, label %if.then, label %if.end
into:
_X: ## @X
## BB#0: ## %entry
subl $12, %esp
movb 16(%esp), %al
addb 20(%esp), %al
jo LBB0_2
Before we were generating:
_X: ## @X
## BB#0: ## %entry
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movb 12(%ebp), %al
testb %al, %al
setge %cl
movb 8(%ebp), %dl
testb %dl, %dl
setge %ah
cmpb %cl, %ah
sete %cl
addb %al, %dl
testb %dl, %dl
setge %al
cmpb %al, %ah
setne %al
andb %cl, %al
testb %al, %al
jne LBB0_2
llvm-svn: 122186
|
| |
|
|
| |
llvm-svn: 122147
|
| |
|
|
| |
llvm-svn: 122134
|
| |
|
|
| |
llvm-svn: 122121
|
| |
|
|
|
|
| |
Remove unnecessary pandn patterns, 'vnot' patfrag looks through bitcasts
llvm-svn: 122098
|
| |
|
|
| |
llvm-svn: 122067
|
| |
|
|
| |
llvm-svn: 122064
|
| |
|
|
|
|
|
|
|
|
| |
IsSymbolRefDifferenceFullyResolved, it turns out this does change behavior on
enough cases for x86-32 that I would rather wait a bit on it.
- In practice, we will want to change this eventually because it only means we
generate less relocations (it also eliminates the need for the horrible
'.set' hack that Darwin requires in some places).
llvm-svn: 122042
|
| |
|
|
|
|
| |
superceded and was effectively dead.
llvm-svn: 122024
|
| |
|
|
| |
llvm-svn: 122005
|
| |
|
|
|
|
| |
interface.
llvm-svn: 121981
|
| |
|
|
| |
llvm-svn: 121973
|
| |
|
|
| |
llvm-svn: 121971
|
| |
|
|
|
|
|
| |
the MCCodeEmitter, which seems like a better organization.
- Also, cleaned up some magic constants while in the area.
llvm-svn: 121953
|
| |
|
|
| |
llvm-svn: 121908
|
| |
|
|
| |
llvm-svn: 121677
|
| |
|
|
|
|
| |
to catch cases where n != m with a shift.
llvm-svn: 121608
|
| |
|
|
| |
llvm-svn: 121471
|
| |
|
|
| |
llvm-svn: 121461
|
| |
|
|
| |
llvm-svn: 121445
|
| |
|
|
|
|
| |
the compiler's point of view. Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX". Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable.
llvm-svn: 121439
|
| |
|
|
|
|
|
|
|
|
|
| |
f:
.cfi_startproc
nop
.cfi_endproc
assembled (on ELF).
llvm-svn: 121434
|
| |
|
|
| |
llvm-svn: 121415
|
| |
|
|
|
|
|
|
| |
the output to the correct register. Fixes a hidden problem uncovered
by the last patch where we'd try to DAG combine our MVT::Other node
oddly.
llvm-svn: 121358
|
| |
|
|
| |
llvm-svn: 121340
|
| |
|
|
|
|
|
| |
popping up at O0 when it wasn't folded and the fast allocator would
complain.
llvm-svn: 121330
|
| |
|
|
| |
llvm-svn: 121328
|
| |
|
|
| |
llvm-svn: 121320
|
| |
|
|
|
|
|
|
| |
PrivateGlobalPrefix as ".L".
Or, global symbols @Lxxxx might be treated as temporal symbol by MCSymbol.
llvm-svn: 121103
|
| |
|
|
|
|
| |
freed data to be read. I will open a bug to track it being reenabled.
llvm-svn: 121028
|
| |
|
|
|
|
|
| |
as llc + llvm-mc. This time ELF is not changed and I tested that llvm-gcc
bootstrap on darwin10 using darwin9's assembler and linker.
llvm-svn: 121006
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
result. This allows us to compile:
void *test12(long count) {
return new int[count];
}
into:
test12:
movl $4, %ecx
movq %rdi, %rax
mulq %rcx
movq $-1, %rdi
cmovnoq %rax, %rdi
jmp __Znam ## TAILCALL
instead of:
test12:
movl $4, %ecx
movq %rdi, %rax
mulq %rcx
seto %cl
testb %cl, %cl
movq $-1, %rdi
cmoveq %rax, %rdi
jmp __Znam
Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.
llvm-svn: 120936
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
backend that they were all implemented except umul. This one fell back
to the default implementation that did a hi/lo multiply and compared the
top. Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test. Now we compile:
void *func(long count) {
return new int[count];
}
into:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
seto %cl ## encoding: [0x0f,0x90,0xc1]
testb %cl, %cl ## encoding: [0x84,0xc9]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
instead of:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.
llvm-svn: 120935
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
select, inserting a not to compensate. Add a missing isZero check
that I lost somehow.
This improves codegen of:
void *func(long count) {
return new int[count];
}
from:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
## encoding: [0xeb,A]
to:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
cmpq $1, %rdx ## encoding: [0x48,0x83,0xfa,0x01]
sbbq %rdi, %rdi ## encoding: [0x48,0x19,0xff]
notq %rdi ## encoding: [0x48,0xf7,0xd7]
orq %rax, %rdi ## encoding: [0x48,0x09,0xc7]
jmp __Znam ## TAILCALL
## encoding: [0xeb,A]
llvm-svn: 120932
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. generalize
(select (x == 0), -1, 0) -> (sign_bit (x - 1))
to:
(select (x == 0), -1, y) -> (sign_bit (x - 1)) | y
2. Handle the identical pattern that happens with !=:
(select (x != 0), y, -1) -> (sign_bit (x - 1)) | y
cmov is often high latency and can't fold immediates or
memory operands. For example for (x == 0) ? -1 : 1, before
we got:
< testb %sil, %sil
< movl $-1, %ecx
< movl $1, %eax
< cmovel %ecx, %eax
now we get:
> cmpb $1, %sil
> sbbl %eax, %eax
> orl $1, %eax
llvm-svn: 120929
|
| |
|
|
| |
llvm-svn: 120923
|
| |
|
|
|
|
|
| |
- Also adds a new POPCNT subtarget feature that is currently enabled if the target
supports SSE4.2 (nehalem) or SSE4A (barcelona).
llvm-svn: 120917
|
| |
|
|
| |
llvm-svn: 120907
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
foo = a - b
.long foo
instead of just
.long a - b
First, on darwin9 64 bits the assembler produces the wrong result. Second,
if "a" is the end of the section all darwin assemblers (9, 10 and mc) will not
consider a - b to be a constant but will if the dummy foo is created.
Split how we handle these cases. The first one is something MC should take care
of. The second one has to be handled by the caller.
llvm-svn: 120889
|