| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ScheduleDAG is responsible for the DAG: SUnits and SDeps. It provides target hooks for latency computation.
ScheduleDAGInstrs extends ScheduleDAG and defines the current scheduling region in terms of MachineInstr iterators. It has access to the target's scheduling itinerary data. ScheduleDAGInstrs provides the logic for building the ScheduleDAG for the sequence of MachineInstrs in the current region. Target's can implement highly custom schedulers by extending this class.
ScheduleDAGPostRATDList provides the driver and diagnostics for current postRA scheduling. It maintains a current Sequence of scheduled machine instructions and logic for splicing them into the block. During scheduling, it uses the ScheduleHazardRecognizer provided by the target.
Specific changes:
- Removed driver code from ScheduleDAG. clearDAG is the only interface needed.
- Added enterRegion/exitRegion hooks to ScheduleDAGInstrs to delimit the scope of each scheduling region and associated DAG. They should be used to setup and cleanup any region-specific state in addition to the DAG itself. This is necessary because we reuse the same ScheduleDAG object for the entire function. The target may extend these hooks to do things at regions boundaries, like bundle terminators. The hooks are called even if we decide not to schedule the region. So all instructions in a block are "covered" by these calls.
- Added ScheduleDAGInstrs::begin()/end() public API.
- Moved Sequence into the driver layer, which is specific to the scheduling algorithm.
llvm-svn: 152208
|
| |
|
|
|
|
| |
ScheduleDAG has nothing to do with how the instructions are scheduled.
llvm-svn: 152206
|
| |
|
|
|
|
| |
Soon, ScheduleDAG will not refer to the BB.
llvm-svn: 152177
|
| |
|
|
| |
llvm-svn: 152001
|
| |
|
|
| |
llvm-svn: 151348
|
| |
|
|
|
|
|
|
| |
Ignore undef uses completely.
Use a more explicit SlotIndex API.
Add more explicit comments.
llvm-svn: 151233
|
| |
|
|
|
|
|
| |
Added array subscript to SparseSet for convenience.
Slight reorg to make it easier to manage the def/use sets.
llvm-svn: 151228
|
| |
|
|
| |
llvm-svn: 151211
|
| |
|
|
| |
llvm-svn: 151205
|
| |
|
|
| |
llvm-svn: 151178
|
| |
|
|
|
|
|
| |
The vast majority of virtual register definitions don't need an entry
in the DAG builder's VRegDefs set.
llvm-svn: 151136
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Affect on SD scheduling and postRA scheduling:
Printing the DAG will display the nodes in top-down topological order.
This matches the order within the MBB and makes my life much easier in general.
Affect on misched:
We don't need to track virtual register uses at all. This is awesome.
I also intend to rely on the SUnit ID as a topo-sort index. So if A < B then we cannot have an edge B -> A.
llvm-svn: 151135
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Passes after RegAlloc should be able to rely on MRI->getNumVirtRegs() == 0.
This makes sharing code for pre/postRA passes more robust.
Now, to check if a pass is running before the RA pipeline begins, use MRI->isSSA().
To check if a pass is running after the RA pipeline ends, use !MRI->getNumVirtRegs().
PEI resets virtual regs when it's done scavenging.
PTX will either have to provide its own PEI pass or assign physregs.
llvm-svn: 151032
|
| |
|
|
| |
llvm-svn: 148174
|
| |
|
|
| |
llvm-svn: 148173
|
| |
|
|
| |
llvm-svn: 148172
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
opportunities that only present themselves after late optimizations
such as tail duplication .e.g.
## BB#1:
movl %eax, %ecx
movl %ecx, %eax
ret
The register allocator also leaves some of them around (due to false
dep between copies from phi-elimination, etc.)
This required some changes in codegen passes. Post-ra scheduler and the
pseudo-instruction expansion passes have been moved after branch folding
and tail merging. They were before branch folding before because it did
not always update block livein's. That's fixed now. The pass change makes
independently since we want to properly schedule instructions after
branch folding / tail duplication.
rdar://10428165
rdar://10640363
llvm-svn: 147716
|
| |
|
|
| |
llvm-svn: 147605
|
| |
|
|
|
|
| |
antidependence latency on ARM in exceedingly rare cases.
llvm-svn: 147594
|
| |
|
|
|
|
|
|
|
|
|
| |
r0 = mov #0
r0 = moveq #1
Then the second instruction has an implicit data dependency on the first
instruction. Sadly I have yet to come up with a small test case that
demonstrate the post-ra scheduler taking advantage of this.
llvm-svn: 146583
|
| |
|
|
| |
llvm-svn: 146547
|
| |
|
|
|
|
|
|
|
|
| |
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
prevent IT blocks from being broken apart.
llvm-svn: 146542
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
|
| |
|
|
|
|
|
|
|
| |
1. Added opcode BUNDLE
2. Taught MachineInstr class to deal with bundled MIs
3. Changed MachineBasicBlock iterator to skip over bundled MIs; added an iterator to walk all the MIs
4. Taught MachineBasicBlock methods about bundled MIs
llvm-svn: 145975
|
| |
|
|
|
|
| |
instruction in Sequence is a Noop
llvm-svn: 145677
|
| |
|
|
|
|
| |
Fixes <rdar://problem/10235725>
llvm-svn: 141357
|
| |
|
|
| |
llvm-svn: 141356
|
| |
|
|
| |
llvm-svn: 134259
|
| |
|
|
|
|
| |
MCInstrItineraries) into MC.
llvm-svn: 134049
|
| |
|
|
|
|
|
|
| |
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
|
| |
|
|
| |
llvm-svn: 132488
|
| |
|
|
| |
llvm-svn: 132487
|
| |
|
|
|
|
|
|
| |
DBG_VALUEs. This approach has several downsides, for example, it does not work when dbg value is a constant integer, it does not work if reg is defined more than once, it places end of debug value range markers in the wrong place. It even causes misleading incorrect debug info when duplicate DBG_VALUE instructions point to same reg def.
Instead, use simpler approach and let DBG_VALUE follow its predecessor instruction. After live debug value analysis pass, all DBG_VALUE instruction are placed at the right place. Thanks Jakob for the hint!
llvm-svn: 132483
|
| |
|
|
| |
llvm-svn: 131022
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
BuildSchedGraph was quadratic in the number of calls in the basic
block. After this fix, it keeps only a single call at the top of the
DefList so compile time doesn't blow up on large blocks. This reduces
postRA sched time on an external test case from 81s to 0.3s. Although
r130800 (reduced ARM register alias defs) also partially fixes the
issue by reducing the constant overhead of checking call interference
by an order of magnitude.
Fixes <rdar://problem/7662664> very poor compile time with post RA scheduling.
llvm-svn: 130943
|
| |
|
|
| |
llvm-svn: 130942
|
| |
|
|
|
|
| |
Luis Felipe Strano Moraes!
llvm-svn: 129558
|
| |
|
|
|
|
|
|
|
|
| |
Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.
This allows memory instructions to be moved around INLINEASM instructions.
llvm-svn: 123044
|
| |
|
|
|
|
|
| |
function so that it can live in Analysis instead of
VMCore.
llvm-svn: 121885
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
llvm-svn: 118135
|
| |
|
|
|
|
| |
fallthroughs uses all registers, just gather the union of all successor liveins.
llvm-svn: 117506
|
| |
|
|
|
|
| |
regression.
llvm-svn: 117329
|
| |
|
|
|
|
|
|
|
|
|
| |
2) live-outs.
Previously the post-RA schedulers completely ignore these dependencies since
returns, branches, etc. are all scheduling barriers. This patch model the
latencies between instructions being scheduled and the barriers. It also
handle calls by marking their register uses.
llvm-svn: 117193
|
| |
|
|
| |
llvm-svn: 116119
|
| |
|
|
|
|
|
|
|
|
| |
implicit. e.g.
%D6<def>, %D7<def> = VLD1q16 %R2<kill>, 0, ..., %Q3<imp-def>
%Q1<def> = VMULv8i16 %Q1<kill>, %Q3<kill>, ...
The real definition indices are 0,1.
llvm-svn: 116080
|
| |
|
|
| |
llvm-svn: 115802
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
allow target to correctly compute latency for cases where static scheduling
itineraries isn't sufficient. e.g. variable_ops instructions such as
ARM::ldm.
This also allows target without scheduling itineraries to compute operand
latencies. e.g. X86 can return (approximated) latencies for high latency
instructions such as division.
- Compute operand latencies for those defined by load multiple instructions,
e.g. ldm and those used by store multiple instructions, e.g. stm.
llvm-svn: 115755
|
| |
|
|
|
|
| |
pipeline forwarding path.
llvm-svn: 115098
|
| |
|
|
|
|
|
|
|
|
|
| |
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
llvm-svn: 113570
|
| |
|
|
|
|
|
| |
instead of fixed size arrays, so that increasing FirstVirtualRegister to 16K
won't cause a compile time performance regression.
llvm-svn: 109330
|