|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | sequence -  target independent framework
 When the DAGcombiner selects instruction sequences
 it could increase the critical path or resource len.
 For example, on arm64 there are multiply-accumulate instructions (madd,
 msub). If e.g. the equivalent  multiply-add sequence is not on the
 crictial path it makes sense to select it instead of  the combined,
 single accumulate instruction (madd/msub). The reason is that the
 conversion from add+mul to the madd could lengthen the critical path
 by the latency of the multiply.
 But the DAGCombiner would always combine and select the madd/msub
 instruction.
 This patch uses machine trace metrics to estimate critical path length
 and resource length of an original instruction sequence vs a combined
 instruction sequence and picks the faster code based on its estimates.
 This patch only commits the target independent framework that evaluates
 and selects code sequences. The machine instruction combiner is turned
 off for all targets and expected to evolve over time by gradually
 handling DAGCombiner pattern in the target specific code.
 This framework lays the groundwork for fixing
 rdar://16319955
llvm-svn: 214666 | 
| | 
| 
| 
| | llvm-svn: 207714 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.
Other sub-trees will follow.
llvm-svn: 206837 | 
| | 
| 
| 
| 
| 
| | instead of comparing to nullptr.
llvm-svn: 206142 | 
| | 
| 
| 
| 
| 
| | While there make array_lengthof constexpr if we have support for it.
llvm-svn: 206112 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&.  At this point they almost behave like normal iterators!
Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.
llvm-svn: 203865 | 
| | 
| 
| 
| 
| 
| | class.
llvm-svn: 203220 | 
| | 
| 
| 
| 
| 
| | The old implementation is no longer needed in C++11.
llvm-svn: 202644 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Replace the ill-defined MinLatency and ILPWindow properties with
with straightforward buffer sizes:
MCSchedMode::MicroOpBufferSize
MCProcResourceDesc::BufferSize
These can be used to more precisely model instruction execution if desired.
Disabled some misched tests temporarily. They'll be reenabled in a few commits.
llvm-svn: 184032 | 
| | 
| 
| 
| 
| 
| 
| | Naturally, we should be able to pass in extra instructions, not just
extra blocks.
llvm-svn: 180667 | 
| | 
| 
| 
| 
| 
| 
| | It it still possible to extract information from itineraries, for
example.
llvm-svn: 178582 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The new instruction scheduling models provide information about the
number of cycles consumed on each processor resource. This makes it
possible to estimate ILP more accurately than simply counting
instructions / issue width.
The functions getResourceDepth() and getResourceLength() now identify
the limiting processor resource, and return a cycle count based on that.
This gives more precise resource information, particularly in traces
that use one resource a lot more than others.
llvm-svn: 178553 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In very rare cases caused by irreducible control flow, the dominating
block can have the same trace head without actually being part of the
trace.
As long as such a dominator still has valid instruction depths, it is OK
to use it for computing instruction depths.
Rename the function to avoid lying, and add a check that instruction
depths are computed for the dominator.
llvm-svn: 176668 | 
| | 
| 
| 
| 
| 
| | Let targets use it.
llvm-svn: 172688 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Not all instructions define a virtual register in their first operand.
Specifically, INLINEASM has a different format.
<rdar://problem/12472811>
llvm-svn: 165721 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | When the CFG contains a loop with multiple entry blocks, the traces
computed by MachineTraceMetrics don't always have the same nice
properties. Loop back-edges are normally excluded from traces, but
MachineLoopInfo doesn't recognize loops with multiple entry blocks, so
those back-edges may be included.
Avoid asserting when that happens by adding an isEarlierInSameTrace()
function that accurately determines if a dominating block is part of the
same trace AND is above the currrent block in the trace.
llvm-svn: 165434 | 
| | 
| 
| 
| | llvm-svn: 165235 | 
| | 
| 
| 
| | llvm-svn: 161712 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Trace::getResourceLength() computes the number of cycles required to
execute the trace when ignoring data dependencies. The number can be
compared to the critical path to estimate the trace ILP.
Trace::getPHIDepth() computes the data dependency depth of a PHI in a
trace successor that isn't necessarily part of the trace.
llvm-svn: 161711 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | When a trace ends with a back-edge, include PHIs in the loop header in
the height computations. This makes the critical path through a loop
more accurate by including the latencies of the last instructions in the
loop.
llvm-svn: 161688 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | We filter out MachineLoop back-edges during the trace-building PO
traversals, but it is possible to have CFG cycles that aren't natural
loops, and MachineLoopInfo doesn't include such cycles.
Use a standard visited set to detect such CFG cycles, and completely
ignore them when picking traces.
llvm-svn: 161532 | 
| | 
| 
| 
| | llvm-svn: 161437 | 
| | 
| 
| 
| 
| 
| 
| 
| | Compare the critical paths of the two traces through an if-conversion
candidate. If the difference is larger than the branch brediction
penalty, reject the if-conversion. If would never pay.
llvm-svn: 161433 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Whenever both instruction depths and instruction heights are known in a
block, it is possible to compute the length of the critical path as
max(depth+height) over the instructions in the block.
The stored live-in lists make it possible to accurately compute the
length of a critical path that bypasses the current (small) block.
llvm-svn: 161197 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The height on an instruction is the minimum number of cycles from the
instruction is issued to the end of the trace. Heights are computed for
all instructions in and below the trace center block.
The method for computing heights is different from the depth
computation. As we visit instructions in the trace bottom-up, heights of
used instructions are pushed upwards. This way, we avoid scanning long
use lists, looking for uses in the current trace.
At each basic block boundary, a list of live-in registers and their
minimum heights is saved in the trace block info. These live-in lists
are used when restarting depth computations on a trace that
converges with an already computed trace. They will also be used to
accurately compute the critical path length.
llvm-svn: 161138 | 
| | 
| 
| 
| | llvm-svn: 161115 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Assuming infinite issue width, compute the earliest each instruction in
the trace can issue, when considering the latency of data dependencies.
The issue cycle is record as a 'depth' from the beginning of the trace.
This is half the computation required to find the length of the critical
path through the trace. Heights are next.
llvm-svn: 161074 | 
| | 
| 
| 
| | llvm-svn: 161072 | 
| | 
| 
| 
| | llvm-svn: 161004 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This lets traces include the final iteration of a nested loop above the
center block, and the first iteration of a nested loop below the center
block.
We still don't allow traces to contain backedges, and traces are
truncated where they would leave a loop, as seen from the center block.
llvm-svn: 161003 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | When computing a trace, all the candidates for pred/succ must have been
visited. Filter out back-edges first, though. The PO traversal ignores
them.
Thanks to Andy for spotting this in review.
llvm-svn: 160995 | 
| | 
| 
| 
| 
| 
| 
| | By overriding Pass::verifyAnalysis(), the pass contents will be verified
by the pass manager.
llvm-svn: 160994 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is a cleaned up version of the isFree() function in
MachineTraceMetrics.cpp.
Transient instructions are very unlikely to produce any code in the
final output. Either because they get eliminated by RegisterCoalescing,
or because they are pseudo-instructions like labels and debug values.
llvm-svn: 160977 | 
| | 
| 
| 
| 
| 
| 
| | This function verifies the consistency of cached data in the
MachineTraceMetrics analysis.
llvm-svn: 160976 | 
| | 
| 
| 
| 
| 
| 
| | The MachineTraceMetrics analysis must be invalidated before modifying
the CFG. This will catch some of the violations of that rule.
llvm-svn: 160969 | 
| | 
| 
| 
| | llvm-svn: 160905 | 
| | 
| 
| 
| 
| 
| 
| | This makes it possible to quickly detect blocks that are outside the
trace.
llvm-svn: 160904 | 
| | 
| 
| 
| | llvm-svn: 160798 | 
|  | This is still a work in progress.
Out-of-order CPUs usually execute instructions from multiple basic
blocks simultaneously, so it is necessary to look at longer traces when
estimating the performance effects of code transformations.
The MachineTraceMetrics analysis will pick a typical trace through a
given basic block and provide performance metrics for the trace. Metrics
will include:
- Instruction count through the trace.
- Issue count per functional unit.
- Critical path length, and per-instruction 'slack'.
These metrics can be used to determine the performance limiting factor
when executing the trace, and how it will be affected by a code
transformation.
Initially, this will be used by the early if-conversion pass.
llvm-svn: 160796 |