| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows a target to use MI-Sched as an in-order scheduler that
will model strict resource conflicts without defining a processor
itinerary. Instead, the target can now use the new per-operand machine
model and define in-order resources with BufferSize=0. For example,
this would allow restricting the type of operations that can be formed
into a dispatch group. (Normally NumMicroOps is sufficient to enforce
dispatch groups).
If the intent is to model latency in in-order pipeline, as opposed to
resource conflicts, then a resource with BufferSize=1 should be
defined instead.
This feature is only casually tested as there are no in-tree targets
using it yet. However, Hal will be experimenting with POWER7.
llvm-svn: 196517
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.
MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.
I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false
For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)
Speedups:
| Benchmarks/BenchmarkGame/recursive | 52.39% |
| Benchmarks/VersaBench/beamformer | 20.80% |
| Benchmarks/Misc/pi | 19.97% |
| Benchmarks/Misc/mandel-2 | 19.95% |
| SPEC/CFP2000/188.ammp | 18.72% |
| Benchmarks/McCat/08-main/main | 18.58% |
| Benchmarks/Misc-C++/Large/sphereflake | 18.46% |
| Benchmarks/Olden/power | 17.11% |
| Benchmarks/Misc-C++/mandel-text | 16.47% |
| Benchmarks/Misc/oourafft | 15.94% |
| Benchmarks/Misc/flops-7 | 14.99% |
| Benchmarks/FreeBench/distray | 14.26% |
| SPEC/CFP2006/470.lbm | 14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode | 12.28% |
| Benchmarks/SmallPT/smallpt | 10.36% |
| Benchmarks/Misc-C++/Large/ray | 8.97% |
| Benchmarks/Misc/fp-convert | 8.75% |
| Benchmarks/Olden/perimeter | 7.10% |
| Benchmarks/Bullet/bullet | 7.03% |
| Benchmarks/Misc/mandel | 6.75% |
| Benchmarks/Olden/voronoi | 6.26% |
| Benchmarks/Misc/flops-8 | 5.77% |
| Benchmarks/Misc/matmul_f64_4x4 | 5.19% |
| Benchmarks/MiBench/security-rijndael | 5.15% |
| Benchmarks/Misc/flops-6 | 5.10% |
| Benchmarks/Olden/tsp | 4.46% |
| Benchmarks/MiBench/consumer-lame | 4.28% |
| Benchmarks/Misc/flops-5 | 4.27% |
| Benchmarks/mafft/pairlocalalign | 4.19% |
| Benchmarks/Misc/himenobmtxpa | 4.07% |
| Benchmarks/Misc/lowercase | 4.06% |
| SPEC/CFP2006/433.milc | 3.99% |
| Benchmarks/tramp3d-v4 | 3.79% |
| Benchmarks/FreeBench/pifft | 3.66% |
| Benchmarks/Ptrdist/ks | 3.21% |
| Benchmarks/Adobe-C++/loop_unroll | 3.12% |
| SPEC/CINT2000/175.vpr | 3.12% |
| Benchmarks/nbench | 2.98% |
| SPEC/CFP2000/183.equake | 2.91% |
| Benchmarks/Misc/perlin | 2.85% |
| Benchmarks/Misc/flops-1 | 2.82% |
| Benchmarks/Misc-C++-EH/spirit | 2.80% |
| Benchmarks/Misc/flops-2 | 2.77% |
| Benchmarks/NPB-serial/is | 2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk | 2.33% |
| Benchmarks/BenchmarkGame/n-body | 2.28% |
| Benchmarks/SciMark2-C/scimark2 | 2.27% |
| Benchmarks/Olden/bh | 2.03% |
| skidmarks10/skidmarks | 1.81% |
| Benchmarks/Misc/flops | 1.72% |
Slowdowns:
| Benchmarks/llubenchmark/llu | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d | -5.67% |
| Benchmarks/Adobe-C++/functionobjects | -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8 | -5.00% |
| Benchmarks/Shootout/hash | -2.35% |
| Benchmarks/Prolangs-C++/ocean | -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall | -1.98% |
| Polybench/linear-algebra/kernels/3mm | -1.95% |
| Benchmarks/McCat/09-vor/vor | -1.68% |
llvm-svn: 196516
|
|
|
|
|
|
|
| |
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.
llvm-svn: 196471
|
|
|
|
|
|
|
| |
This makes the API a bit more natural to use and makes it easier to make
LiveRanges implementation details private.
llvm-svn: 192394
|
|
|
|
| |
llvm-svn: 189988
|
|
|
|
|
|
|
| |
This removes all expensive pressure tracking logic from the scheduling
critical path of node comparison.
llvm-svn: 189643
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
liveness later.
Created SUPressureDiffs array to hold the per node PDiff computed during DAG building.
Added a getUpwardPressureDelta API that will soon replace the old
one. Compute PressureDelta here from the precomputed PressureDiffs.
Updating for liveness will come next.
llvm-svn: 189640
|
|
|
|
|
|
| |
This should be much more clear now. It's still disabled pending testing.
llvm-svn: 189597
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are several optional (off-by-default) features in CodeGen that can make
use of alias analysis. These features are important for generating code for
some kinds of cores (for example the (in-order) PPC A2 core). This adds a
useAA() function to TargetSubtargetInfo to allow these features to be enabled
by default on a per-subtarget basis.
Here is the first use of this function: To control the default of the
-enable-aa-sched-mi feature.
llvm-svn: 189563
|
|
|
|
|
|
|
|
|
|
|
| |
Estimate the cyclic critical path within a single block loop. If the
acyclic critical path is longer, then the loop will exhaust OOO
resources after some number of iterations. If lag between the acyclic
critical path and cyclic critical path is longer the the time it takes
to issue those loop iterations, then aggressively schedule for
latency.
llvm-svn: 189120
|
|
|
|
|
|
|
|
|
| |
This will be used to compute the cyclic critical path and to
update precomputed per-node pressure differences.
In the longer term, it could also be used to speed up LiveInterval
update by avoiding visiting all global vreg users.
llvm-svn: 189118
|
|
|
|
|
|
|
|
|
| |
count.
This fixes a pathological compile time problem with very large blocks
and lots of scheduling boundaries.
llvm-svn: 189116
|
|
|
|
|
|
| |
avoid specifying the vector size unnecessarily.
llvm-svn: 185512
|
|
|
|
| |
llvm-svn: 185266
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the ill-defined MinLatency and ILPWindow properties with
with straightforward buffer sizes:
MCSchedMode::MicroOpBufferSize
MCProcResourceDesc::BufferSize
These can be used to more precisely model instruction execution if desired.
Disabled some misched tests temporarily. They'll be reenabled in a few commits.
llvm-svn: 184032
|
|
|
|
|
|
|
|
|
|
|
| |
The register allocator expects minimal physreg live ranges. Schedule
physreg copies accordingly. This is slightly tricky when they occur in
the middle of the scheduling region. For now, this is handled by
rescheduling the copy when its associated instruction is
scheduled. Eventually we may instead bundle them, but only if we can
preserve the bundles as parallel copies during regalloc.
llvm-svn: 179449
|
|
|
|
|
|
|
| |
MI sched DAG construction allows targets to include terminators into scheduling DAG.
Extend this functionality to labels as well.
llvm-svn: 174977
|
|
|
|
| |
llvm-svn: 173431
|
|
|
|
|
|
| |
This fixes DAG subtree analysis at the boundary.
llvm-svn: 173427
|
|
|
|
|
|
|
|
|
| |
Maintain separate per-node and per-tree book-keeping.
Track all instructions above a DAG node including nested subtrees.
Seperately track instructions within a subtree.
Record subtree parents.
llvm-svn: 173426
|
|
|
|
|
|
|
| |
Allow the strategy to select SchedDFS. Allow the results of SchedDFS
to affect initialization of the scheduler state.
llvm-svn: 173425
|
|
|
|
|
|
|
| |
This is mostly refactoring, along with adding an instruction count
within the subtrees and ensuring we only look at data edges.
llvm-svn: 173420
|
|
|
|
|
|
|
|
|
|
| |
For sanity, create a root when NumDataSuccs >= 4. Splitting large
subtrees will no longer be detrimental after my next checkin to handle
nested tree. A magic number of 4 is fine because single subtrees
seldom rejoin more than this. It makes subtrees easier to visualize
and heuristics more sane.
llvm-svn: 173399
|
|
|
|
|
|
|
|
| |
scheduler to use it.
A SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's desirable properties. Essentially, SparseMultiSet provides multiset behavior by storing its dense data in doubly linked lists that are inlined into the dense vector. This allows it to provide good data locality as well as vector-like constant-time clear() and fast constant time find(), insert(), and erase(). It also allows SparseMultiSet to have a builtin recycler rather than keeping SparseSet's behavior of always swapping upon removal, which allows it to preserve more iterators. It's often a better alternative to a SparseSet of a growable container or vector-of-vector.
llvm-svn: 173064
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
|
|
|
|
| |
llvm-svn: 170454
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
misched used GetUnderlyingObject in order to break false load/store
dependencies, and the -enable-aa-sched-mi feature similarly relied on
GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis.
Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so
(especially due to LSR) all of these mechanisms failed for
induction-variable-dependent loads and stores inside loops.
This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects
(which will recurse through phi and select instructions) in misched.
Andy reviewed, tested and simplified this patch; Thanks!
llvm-svn: 169744
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
|
|
|
|
|
|
|
| |
Assertion failed: (TopRPTracker.getPos() == RegionBegin && "bad initial Top tracker").
rdar://12790302.
llvm-svn: 169072
|
|
|
|
|
|
|
| |
Assertion failed: (VNI && "No value to read by operand")
rdar://12790267.
llvm-svn: 169071
|
|
|
|
|
|
|
|
|
|
|
| |
This is a simple, cheap infrastructure for analyzing the shape of a
DAG. It recognizes uniform DAGs that take the shape of bottom-up
subtrees, such as the included matrix multiplication example. This is
useful for heuristics that balance register pressure with ILP. Two
canonical expressions of the heuristic are implemented in scheduling
modes: -misched-ilpmin and -misched-ilpmax.
llvm-svn: 168773
|
|
|
|
| |
llvm-svn: 168772
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a hole in the "cheap" alias analysis logic implemented within
the DAG builder itself, regardless of whether proper alias analysis is
enabled. It now handles this pattern produced by LSR+CodeGenPrepare.
%sunkaddr1 = ptrtoint * %obj to i64
%sunkaddr2 = add i64 %sunkaddr1, %lsr.iv
%sunkaddr3 = inttoptr i64 %sunkaddr2 to i32*
store i32 %v, i32* %sunkaddr3
llvm-svn: 168768
|
|
|
|
|
|
|
| |
Similarly to several recent fixes throughout the code replace std::map use with the MapVector.
Add find() method to the MapVector.
llvm-svn: 168051
|
|
|
|
|
|
|
|
| |
This adds support for weak DAG edges to the general scheduling
infrastructure in preparation for MachineScheduler support for
heuristics based on weak edges.
llvm-svn: 167738
|
|
|
|
|
|
|
| |
This is in preparation for adding "weak" DAG edges, but generally
simplifies the design.
llvm-svn: 167435
|
|
|
|
|
|
|
|
|
|
| |
the MachineInstr MayLoad/MayLoad flags are based on the tablegen implementation.
For inline assembly, however, we need to compute these based on the constraints.
Revert r166929 as this is no longer needed, but leave the test case in place.
rdar://12033048 and PR13504
llvm-svn: 167040
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
incorrect instruction sequence due to it not being aware that an
inline assembly instruction may reference memory.
This patch fixes the problem by causing the scheduler to always assume that any
inline assembly code instruction could access memory. This is necessary because
the internal representation of the inline instruction does not include
any information about memory accesses.
This should fix PR13504.
llvm-svn: 166929
|
|
|
|
| |
llvm-svn: 166750
|
|
|
|
| |
llvm-svn: 165950
|
|
|
|
|
|
|
|
| |
Allows the new machine model to be used for NumMicroOps and OutputLatency.
Allows the HazardRecognizer to be disabled along with itineraries.
llvm-svn: 165603
|
|
|
|
|
|
| |
This wasn't contributing anything significant to postRA heuristics except compile time (by my measurements) and will be replaced by a more general heuristic for cross-region dependencies within the scheduler itself.
llvm-svn: 165563
|
|
|
|
| |
llvm-svn: 165418
|
|
|
|
|
|
| |
SchedModel
llvm-svn: 165417
|
|
|
|
| |
llvm-svn: 164153
|
|
|
|
|
|
|
|
| |
"#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)"
No functional change. Update r163339.
llvm-svn: 163653
|
|
|
|
|
|
| |
No functional change.
llvm-svn: 163339
|
|
|
|
|
|
|
|
| |
Ordered memory operations are more constrained than volatile loads and
stores because they must be ordered with respect to all other memory
operations.
llvm-svn: 162861
|
|
|
|
|
|
|
|
|
| |
The logic for recomputing latency based on a ScheduleDAG edge was
shady. This bypasses the problem by requiring the client to provide
operand indices. This ensures consistent use of the machine model's
API.
llvm-svn: 162420
|
|
|
|
| |
llvm-svn: 161010
|