| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 139431
|
| |
|
|
|
|
| |
have a predicate operand, unlike conditional branches.
llvm-svn: 139415
|
| |
|
|
|
|
|
|
|
|
|
| |
It appears that our use of the imp-use and imp-def flags with
sub-registers is not yet robust enough to support this.
The failing test case is complicated, I am working on a reduction.
<rdar://problem/10044201>
llvm-svn: 138861
|
| |
|
|
|
|
|
|
| |
There is no non-writeback store multiple instruction in Thumb1, so
don't define one. As a result load multiple is the only instantiation of
the multiclass, so refactor that away entirely.
llvm-svn: 138338
|
| |
|
|
| |
llvm-svn: 138177
|
| |
|
|
|
|
|
|
| |
This pleases the register scavenger and brings
test/CodeGen/ARM/2011-08-12-vmovqqqq-pseudo.ll a little closer to
working with -verify-machineinstrs.
llvm-svn: 138164
|
| |
|
|
|
|
|
| |
Therefore, rather then generate a pseudo instruction, which is later expanded,
generate the necessary instructions in place.
llvm-svn: 138163
|
| |
|
|
|
|
| |
register subclasses. Hopefully this fixes some buildbots.
llvm-svn: 137223
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Cortex-A8, we use the NEON v2f32 instructions for f32 arithmetic. For
better latency, we also send D-register copies down the NEON pipeline by
translating them to vorr instructions.
This patch promotes even S-register copies to D-register copies when
possible so they can also go down the NEON pipeline. Example:
vldr.32 s0, LCPI0_0
loop:
vorr d1, d0, d0
loop2:
...
vadd.f32 d1, d1, d16
The vorr instruction looked like this after regalloc:
%S2<def> = COPY %S0, %D1<imp-def>
Copies involving odd S-registers, and copies that don't define the full
D-register are left alone.
llvm-svn: 137182
|
| |
|
|
|
|
| |
They improve the verbose assembly.
llvm-svn: 137069
|
| |
|
|
|
|
| |
allowing us to distinguish the encodings that use shifted registers from those that use shifted immediates. This is necessary to allow the fixed-length decoder to distinguish things like BICS vs LDRH.
llvm-svn: 135693
|
| |
|
|
|
|
| |
ARM MC code from target.
llvm-svn: 135636
|
| |
|
|
|
|
| |
to simplify the path towards an auto-generated disassembler.
llvm-svn: 135290
|
| |
|
|
|
|
| |
registeration and creation code into XXXMCDesc libraries.
llvm-svn: 135184
|
| |
|
|
|
|
| |
an opcode. Switch ARM over to using that rather than its own special MCInstrDesc bits.
llvm-svn: 135106
|
| |
|
|
| |
llvm-svn: 134858
|
| |
|
|
| |
llvm-svn: 134244
|
| |
|
|
|
|
|
| |
The tSpill and tRestore instructions are just copies of the tSTRspi and
tLDRspi instructions, respectively. Just use those directly instead.
llvm-svn: 134092
|
| |
|
|
| |
llvm-svn: 134030
|
| |
|
|
| |
llvm-svn: 134024
|
| |
|
|
|
|
|
|
| |
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
|
| |
|
|
|
|
| |
There are probably more instances of this floating around.
llvm-svn: 130474
|
| |
|
|
|
|
|
|
|
| |
is, it assumes addresses are 64-bit aligned (which should be the more common
case). If the alignment is found not to be aligned, then getOperandLatency()
would adjust the operand latency computation by one to compensate for it.
rdar://9294833
llvm-svn: 129742
|
| |
|
|
|
|
| |
a case involving EOR, so I only added a test for ORR.
llvm-svn: 129610
|
| |
|
|
|
|
| |
problem as all of the other instructions we fold with CMPs.
llvm-svn: 129602
|
| |
|
|
|
|
| |
fixes <rdar://problem/9287901>.
llvm-svn: 129599
|
| |
|
|
|
|
| |
Luis Felipe Strano Moraes!
llvm-svn: 129558
|
| |
|
|
| |
llvm-svn: 129429
|
| |
|
|
|
|
| |
folded comparisons, just like ADD and SUB.
llvm-svn: 129038
|
| |
|
|
|
|
| |
actually exist.
llvm-svn: 128461
|
| |
|
|
|
|
|
|
|
|
|
|
| |
entries being compared may not be ARMConstantPoolValue. Without checking
whether they are ARMConstantPoolValue first, and if the stars and moons
are aligned properly, the equality test may return true (when the first few
words of two Constants' values happen to be identical) and very bad things can
happen.
rdar://9125354
llvm-svn: 128203
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
int tries = INT_MAX;
while (tries > 0) {
tries--;
}
The check should be:
subs r4, #1
cmp r4, #0
bgt LBB0_1
The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop
canonicalization apparently does in this case). cmp #0 would have cleared
it while not changing the N and Z bits. Since BGT is dependent on the V
bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0.
rdar://9172742
llvm-svn: 128179
|
| |
|
|
|
|
|
| |
This is just very first approximation how the stuff should be done
(e.g. ARM-only for now). More to follow.
llvm-svn: 127101
|
| |
|
|
|
|
|
|
|
|
| |
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.
It's now enabled by default for Darwin.
llvm-svn: 123991
|
| |
|
|
|
|
|
|
|
|
|
| |
flags. They are still not enable in this revision.
Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with
the scheduler's model of operand latency in the selection DAG.
Generalized unit tests to work with sched-cycles.
llvm-svn: 123969
|
| |
|
|
|
|
|
|
|
| |
value, the "add pc" must be CSE'ed at the same time. We could follow the same
approach as T2 by adding pseudo instructions that combine the ldr + "add pc".
But the better approach is to use movw + movt (which I will enable soon), so
I'll leave this as a TODO.
llvm-svn: 123949
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.
Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.
ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
to re-materialize the instruction, allow machine LICM to hoist the set of
instructions out of the loop and make it possible to CSE them. It's a bit
hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.
With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.
llvm-svn: 123905
|
| |
|
|
|
|
|
|
|
|
|
|
| |
movw r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4))
movt r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4))
LPC0_0:
add r0, pc, r0
It's not yet enabled by default as some tests are failing. I suspect bugs in
down stream tools.
llvm-svn: 123619
|
| |
|
|
|
|
|
|
| |
These functions not longer assert when passed 0, but simply return false instead.
No functional change intended.
llvm-svn: 123155
|
| |
|
|
| |
llvm-svn: 123048
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
llvm-svn: 122541
|
| |
|
|
| |
llvm-svn: 122539
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now. It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior. Since that isn't obviously wrong, I've just
changed the test file. This completes the work for Radar 8711675.
llvm-svn: 121730
|
| |
|
|
|
|
|
| |
possible. They were duplicates for everything exception the source pattern
before.
llvm-svn: 121179
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
additional pipeline stall. So it's frequently better to single codegen
vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
vmla + vmla is very bad. But this isn't ideal either:
vmul
vadd
vmla
Instead, we want to expand the second vmla:
vmla
vmul
vadd
Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
vmla / vmls will trigger one of the special hazards.
Work in progress, only A+B are enabled.
llvm-svn: 120960
|
| |
|
|
|
|
| |
data. Next up, pseudo-izing them.
llvm-svn: 120320
|
| |
|
|
| |
llvm-svn: 120228
|
| |
|
|
|
|
|
|
|
| |
Remove movePastCSLoadStoreOps and associated code for simple pointer
increments. Update routines that depended upon other opcodes for save/restore.
Adjust all testcases accordingly.
llvm-svn: 119725
|
| |
|
|
| |
llvm-svn: 119610
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.
Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.
Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.
2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.
rdar://8663787, rdar://8241368
llvm-svn: 119548
|