| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
rdar://problem/8893967
llvm-svn: 124137
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DAG. Disable using "-disable-sched-cycles".
For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.
Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
operations can have their latency covered. i.e. two independent 4
cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
latency-based stalls on their uses will be prioritized by depth before height
(height is irrelevant if no stalls occur in the schedule below this point).
llvm-svn: 123971
|
|
|
|
|
|
|
|
|
|
|
| |
flags. They are still not enable in this revision.
Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with
the scheduler's model of operand latency in the selection DAG.
Generalized unit tests to work with sched-cycles.
llvm-svn: 123969
|
|
|
|
|
|
|
|
| |
Added a check for already live regs before claiming HighRegPressure.
Fixed a few cases of checking the wrong number of successors.
Added some tracing until these heuristics are better understood.
llvm-svn: 123892
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
disabled in this checkin. Sorry for the large diffs due to
refactoring. New functionality is all guarded by EnableSchedCycles.
Scheduling the isel DAG is inherently imprecise, but we give it a best
effort:
- Added MayReduceRegPressure to allow stalled nodes in the queue only
if there is a regpressure need.
- Added BUHasStall to allow checking for either dependence stalls due to
latency or resource stalls due to pipeline hazards.
- Added BUCompareLatency to encapsulate and standardize the heuristics
for minimizing stall cycles (vs. reducing register pressure).
- Modified the bottom-up heuristic (now in BUCompareLatency) to
prioritize nodes by their depth rather than height. As long as it
doesn't stall, height is irrelevant. Depth represents the critical
path to the DAG root.
- Added hybrid_ls_rr_sort::isReady to filter stalled nodes before
adding them to the available queue.
Related Cleanup: most of the register reduction routines do not need
to be templates.
llvm-svn: 123468
|
|
|
|
| |
llvm-svn: 122545
|
|
|
|
|
|
| |
scheduling node may have a NULL DAG node, yuck.
llvm-svn: 122544
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
llvm-svn: 122541
|
|
|
|
| |
llvm-svn: 122509
|
|
|
|
|
|
| |
and instruction issue.
llvm-svn: 122491
|
|
|
|
|
|
| |
multiple nodes per cycle.
llvm-svn: 122474
|
|
|
|
| |
llvm-svn: 122473
|
|
|
|
|
|
|
|
| |
In the bottom-up selection DAG scheduling, handle two-address
instructions that read/write unspillable registers. Treat
the entire chain of two-address nodes as a single live range.
llvm-svn: 122472
|
|
|
|
|
|
|
| |
the same physical register. Simplifies the fix from the previous
checkin r122211.
llvm-svn: 122370
|
|
|
|
| |
llvm-svn: 122368
|
|
|
|
|
|
|
| |
something that just glues two nodes together, even if it is
sometimes used for flags.
llvm-svn: 122310
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Imagine we see:
EFLAGS = inst1
EFLAGS = inst2 FLAGS
gpr = inst3 EFLAGS
Previously, we would refuse to schedule inst2 because it clobbers
the EFLAGS of the predecessor. However, it also uses the EFLAGS
of the predecessor, so it is safe to emit. SDep edges ensure that
the right order happens already anyway.
This fixes 2 testsuite crashes with the X86 patch I'm going to
commit next.
llvm-svn: 122211
|
|
|
|
| |
llvm-svn: 122209
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
llvm-svn: 118135
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.
BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB
=>
BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB
This fixed the recent 256.bzip2 regression.
llvm-svn: 117675
|
|
|
|
|
|
| |
enough to factor into scheduling priority. Eliminate it and add early exits to speed up scheduling.
llvm-svn: 109449
|
|
|
|
|
|
|
|
| |
parameter)
may be used uninitialized in the callers of HighRegPressure.
llvm-svn: 109393
|
|
|
|
| |
llvm-svn: 109383
|
|
|
|
|
|
| |
those. Radar 8231572.
llvm-svn: 109367
|
|
|
|
|
|
|
|
|
|
|
|
| |
appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.
On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.
llvm-svn: 109300
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
it's too late to start backing off aggressive latency scheduling when most
of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
For ARM, this is almost always a win on # of instructions. It's runtime
neutral for most of the tests. But for some kernels with high register
pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
54 and sped up by 20%.
llvm-svn: 109279
|
|
|
|
| |
llvm-svn: 109083
|
|
|
|
| |
llvm-svn: 109082
|
|
|
|
| |
llvm-svn: 109079
|
|
|
|
| |
llvm-svn: 109064
|
|
|
|
| |
llvm-svn: 108991
|
|
|
|
|
|
|
|
|
| |
of getPhysicalRegisterRegClass with it.
If we want to make a copy (or estimate its cost), it is better to use the
smallest class as more efficient operations might be possible.
llvm-svn: 107140
|
|
|
|
| |
llvm-svn: 105168
|
|
|
|
|
|
| |
just return zero.
llvm-svn: 105061
|
|
|
|
|
|
|
|
| |
implementing pop with a linear search for a "best" element. The priority
queue was a neat idea, but in practice the comparison functions depend
on dynamic information.
llvm-svn: 104718
|
|
|
|
| |
llvm-svn: 104716
|
|
|
|
|
|
| |
base class, since all the implementations are the same.
llvm-svn: 104659
|
|
|
|
| |
llvm-svn: 104306
|
|
|
|
|
|
| |
what for latency in hybrid mode.
llvm-svn: 104293
|
|
|
|
|
|
|
|
|
| |
pipeline stall. It's useful for targets like ARM cortex-a8. NEON has a lot
of long latency instructions so a strict register pressure reduction
scheduler does not work well.
Early experiments show this speeds up some NEON loops by over 30%.
llvm-svn: 104216
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Introduce some enums and accessors in the InlineAsm class
that eliminate a ton of magic numbers when handling inline
asm SDNode.
2. Add a new MDNodeSDNode selection dag node type that holds
a MDNode (shocking!)
3. Add a new argument to ISD::INLINEASM nodes that hold !srcloc
metadata, propagating it to the instruction emitter, which
drops it.
No functionality change.
llvm-svn: 100605
|
|
|
|
|
|
|
|
|
| |
into TargetOpcodes.h. #include the new TargetOpcodes.h
into MachineInstr. Add new inline accessors (like isPHI())
to MachineInstr, and start using them throughout the
codebase.
llvm-svn: 95687
|
|
|
|
|
|
|
|
| |
predecessors to the unfolded load. It decides what gets moved to the load by checking whether the new load is using the predecessor as an operand. The check neglects the cases whether the predecessor is a flagged scheduling unit.
rdar://7604000
llvm-svn: 95339
|
|
|
|
|
|
|
|
|
| |
the '-pre-RA-sched' flag. It actually makes more sense to do it this way. Also,
keep track of the SDNode ordering by default. Eventually, we would like to make
this ordering a way to break a "tie" in the scheduler. However, doing that now
breaks the "CodeGen/X86/abi-isel.ll" test for 32-bit Linux.
llvm-svn: 94308
|
|
|
|
|
|
| |
order.
llvm-svn: 92810
|
|
|
|
| |
llvm-svn: 92807
|
|
|
|
|
|
| |
bottom-up scheduler. We prefer the lower order number.
llvm-svn: 92806
|
|
|
|
| |
llvm-svn: 92576
|
|
|
|
|
|
| |
VISIBILITY_HIDDEN removal.
llvm-svn: 85043
|
|
|
|
|
|
|
| |
Chris claims we should never have visibility_hidden inside any .cpp file but
that's still not true even after this commit.
llvm-svn: 85042
|