| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 128870
|
|
|
|
| |
llvm-svn: 128867
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a virtual register has a single value that is defined as a copy of a
reserved register, permit that copy to be joined. These virtual register are
usually copies of the stack pointer:
%vreg75<def> = COPY %ESP; GR32:%vreg75
MOV32mr %vreg75, 1, %noreg, 0, %noreg, %vreg74<kill>
MOV32mi %vreg75, 1, %noreg, 8, %noreg, 0
MOV32mi %vreg75<kill>, 1, %noreg, 4, %noreg, 0
CALLpcrel32 ...
Coalescing these virtual registers early decreases register pressure.
Previously, they were coalesced by RALinScan::attemptTrivialCoalescing after
register allocation was completed.
The lower register pressure causes the mcinst-lowering-cmp0.ll test case to fail
because it depends on linear scan spilling a particular register.
I am deleting 2008-08-05-SpillerBug.ll because it is counting the number of
instructions emitted, and its revision history shows the 'correct' count being
edited many times.
llvm-svn: 128845
|
|
|
|
|
|
|
|
| |
The code inserted by PPCTargetLowering::EmitInstrWithCustomInserter for ppc64 is
wrong, and I don't know how to fix it. It seems to be using the correct register
classes for pointers, but it inserts all 32-bit instructions.
llvm-svn: 128835
|
|
|
|
| |
llvm-svn: 128827
|
|
|
|
| |
llvm-svn: 128767
|
|
|
|
|
|
|
|
| |
registers that arise from argument shuffling with the soft float ABI. These
instructions are particularly slow on Cortex A8. This fixes one half of
<rdar://problem/8674845>.
llvm-svn: 128759
|
|
|
|
| |
llvm-svn: 128736
|
|
|
|
| |
llvm-svn: 128718
|
|
|
|
| |
llvm-svn: 128707
|
|
|
|
| |
llvm-svn: 128706
|
|
|
|
| |
llvm-svn: 128690
|
|
|
|
|
|
| |
The LocalStackSlotAllocation pass was creating illegal registers.
llvm-svn: 128687
|
|
|
|
| |
llvm-svn: 128686
|
|
|
|
| |
llvm-svn: 128680
|
|
|
|
|
|
|
|
|
|
|
| |
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2
llvm-svn: 128665
|
|
|
|
|
|
| |
Add an extra run with -regalloc=basic to keep them honest.
llvm-svn: 128654
|
|
|
|
|
|
| |
handling of FP comparisons.
llvm-svn: 128650
|
|
|
|
|
|
|
|
|
| |
liveness.
Turn them into noop KILL instructions instead. This lets the scavenger know when
super-registers are killed and defined.
llvm-svn: 128645
|
|
|
|
|
|
|
|
|
|
|
|
| |
This way, shrinkToUses() will ignore the instruction that is about to be
deleted, and we avoid leaving invalid live ranges that SplitKit doesn't like.
Fix a misunderstanding in MachineVerifier about <def,undef> operands. The
<undef> flag is valid on def operands where it has the same meaning as <undef>
on a use operand. It only applies to sub-register defines which also read the
full register.
llvm-svn: 128642
|
|
|
|
| |
llvm-svn: 128633
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rematerialized instruction may require a more constrained register class
than the register being spilled. In the test case, the spilled register has been
inflated to the DPR register class, but we are rematerializing a load of the
ssub_0 sub-register which only exists for DPR_VFP2 registers.
The register class is reinflated after spilling, so the conservative choice is
only temporary.
llvm-svn: 128610
|
|
|
|
| |
llvm-svn: 128586
|
|
|
|
|
|
| |
can be recognized. This fixes <rdar://problem/9183078>.
llvm-svn: 128584
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
was lowering them to sext / uxt + mul instructions. Unfortunately the
optimization passes may hoist the extensions out of the loop and separate them.
When that happens, the long multiplication instructions can be broken into
several scalar instructions, causing significant performance issue.
Note the vmla and vmls intrinsics are not added back. Frontend will codegen them
as intrinsics vmull* + add / sub. Also note the isel optimizations for catching
mul + sext / zext are not changed either.
First part of rdar://8832507, rdar://9203134
llvm-svn: 128502
|
|
|
|
|
|
| |
<rdar://problem/8875309> and <rdar://problem/9057191>.
llvm-svn: 128492
|
|
|
|
| |
llvm-svn: 128445
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
vmull q0, d4, d6
vmlal q0, d5, d6
is faster than
vaddl q0, d4, d5
vmovl q1, d6
vmul q0, q0, q1
This allows us to vmull + vmlal for:
f = vmull_u8( vget_high_u8(s), c);
f = vmlal_u8(f, vget_low_u8(s), c);
rdar://9197392
llvm-svn: 128444
|
|
|
|
|
|
|
| |
becomes reachable when before it wasn't). Check to make sure that it's not null
before trying to use it.
llvm-svn: 128434
|
|
|
|
|
|
|
|
|
|
|
| |
Correctly terminate the range of register DBG_VALUEs when the register is
clobbered or when the basic block ends.
The code is now ready to deal with variables that are sometimes in a register
and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack
slot'.
llvm-svn: 128327
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
masks to match inversely for the code as is to work. For the example given
we actually want:
bfi r0, r2, #1, #1
not #0, however, given the way the pattern is written it's not possible
at the moment.
Fixes rdar://9177502
llvm-svn: 128320
|
|
|
|
|
|
|
|
|
|
|
|
| |
DBG_VALUEs.
The .dot directives don't need labels, that is a leftover from when we created
line number info manually.
Instructions following a DBG_VALUE can share its label since the DBG_VALUE
doesn't produce any code.
llvm-svn: 128284
|
|
|
|
| |
llvm-svn: 128245
|
|
|
|
|
|
|
|
| |
r119613.
A better approach would be to move source id handling inside MC.
llvm-svn: 128233
|
|
|
|
|
| |
FIXME: Some cleanups would be needed.
llvm-svn: 128206
|
|
|
|
|
|
| |
void return type. This fixes PR9487.
llvm-svn: 128197
|
|
|
|
| |
llvm-svn: 128183
|
|
|
|
|
|
| |
I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix.
llvm-svn: 128181
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
int tries = INT_MAX;
while (tries > 0) {
tries--;
}
The check should be:
subs r4, #1
cmp r4, #0
bgt LBB0_1
The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop
canonicalization apparently does in this case). cmp #0 would have cleared
it while not changing the N and Z bits. Since BGT is dependent on the V
bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0.
rdar://9172742
llvm-svn: 128179
|
|
|
|
|
|
| |
Also cleaning up some duplicated code while I'm here.
llvm-svn: 128176
|
|
|
|
|
|
| |
(target-specific branchless method for double-width relational comparisons on x86)
llvm-svn: 128175
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will extend the ranges of debug info variables in registers until they are
clobbered.
Fix 1: Don't mistake DBG_VALUE instructions referring to incoming arguments on
the stack with DBG_VALUE instructions referring to variables in the frame
pointer. This fixes the gdb test-suite failure.
Fix 2: Don't trace through copies to physical registers setting up call
arguments. These registers are call clobbered, and the source register is more
likely to be a callee-saved register that can be extended through the call
instruction.
llvm-svn: 128114
|
|
|
|
|
|
| |
Temporarily reverting these to see if we can get llvm-objdump to link. Hopefully this is not the problem.
llvm-svn: 128097
|
|
|
|
| |
llvm-svn: 128084
|
|
|
|
|
|
|
|
|
|
| |
These ranges get completely jumbled by the post-ra scheduler, and it is not
really reasonable to expect it to make sense of them.
Instead, teach DwarfDebug to notice when user variables in registers are
clobbered, and terminate the ranges there.
llvm-svn: 128045
|
|
|
|
|
|
| |
outside of the current basic block. This fixes PR9500, rdar://9156159.
llvm-svn: 128041
|
|
|
|
|
|
|
| |
gun as does. This makes it a lot easier to compare the output of both
as the addresses are now a lot closer.
llvm-svn: 127972
|
|
|
|
|
|
| |
to canonicalize IR", it broke a lot of things.
llvm-svn: 127954
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
switch(x) {
case 1: return f1();
case 2: return f2();
case 3: return f3();
case 4: return f4();
case 5: return f5();
case 6: return f6();
}
}
=>
LBB0_2: ## %sw.bb
callq _f1
popq %rbp
ret
LBB0_3: ## %sw.bb1
callq _f2
popq %rbp
ret
LBB0_4: ## %sw.bb3
callq _f3
popq %rbp
ret
This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:
sw.bb7: ; preds = %entry
%call8 = tail call i32 @f5() nounwind
br label %return
sw.bb9: ; preds = %entry
%call10 = tail call i32 @f6() nounwind
br label %return
return:
%retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
ret i32 %retval.0
This allows codegen to generate better code like this:
LBB0_2: ## %sw.bb
jmp _f1 ## TAILCALL
LBB0_3: ## %sw.bb1
jmp _f2 ## TAILCALL
LBB0_4: ## %sw.bb3
jmp _f3 ## TAILCALL
rdar://9147433
llvm-svn: 127953
|
|
|
|
|
|
|
|
| |
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.
llvm-svn: 127951
|