| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now the scheduler updates a node's ready time as soon as it is
scheduled, before releasing dependent nodes. There was a reason I
didn't do this initially but it no longer applies.
A53 is in-order and was running into an issue where nodes where added
to the readyQ too early. That's now fixed.
This also makes it easier for custom scheduling strategies to build
heuristics based on the actual cycles that the node was scheduled at.
The only impact on OOO (sandybridge/cyclone) is that ready times will
be slightly more accurate. I didn't measure any significant regressions.
llvm-svn: 210390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As requested by AArch64 subtargets.
Note that this will have no effect until the
AArch64 target actually enables the pass like this:
substitutePass(&PostRASchedulerID, &PostMachineSchedulerID);
As soon as armv7 switches over, PostMachineScheduler will become the
default postRA scheduler, so this won't be necessary any more.
Targets using the old postRA schedule would then do:
substitutePass(&PostMachineSchedulerID, &PostRASchedulerID);
llvm-svn: 210167
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These were not exposed previously because I didn't want out-of-tree
targets to be too dependent on their internals. They can be reused for
a very wide variety of processors with casual scheduling needs without
exposing the classes by instead using hooks defined in
MachineSchedPolicy (we can add more if needed). When targets are more
aggressively tuned or want to provide custom heuristics, they can
define their own MachineSchedStrategy. I tend to think this is better
once you start customizing heuristics because you can copy over only
what you need. I don't think that layering heuristics generally works
well.
However, Arch64 targets now want to reuse the Generic scheduling logic
but also provide extensions. I don't see much harm in exposing the
Generic scheduling classes with a major caveat: these scheduling
strategies may change in the future without validating performance on
less mainstream processors. If you want to be immune from changes,
just define your own MachineSchedStrategy.
llvm-svn: 210166
|
|
|
|
|
|
| |
'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves.
llvm-svn: 207511
|
|
|
|
|
|
|
|
|
|
|
|
| |
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.
Other sub-trees will follow.
llvm-svn: 206837
|
|
|
|
| |
llvm-svn: 206784
|
|
|
|
|
|
| |
instead of comparing to nullptr.
llvm-svn: 206142
|
|
|
|
|
|
|
| |
pass normally runs at optimization level None, or is part of the
register allocation pipeline.
llvm-svn: 205228
|
|
|
|
| |
llvm-svn: 203444
|
|
|
|
|
|
| |
No functionality change.
llvm-svn: 203288
|
|
|
|
|
|
| |
class.
llvm-svn: 203220
|
|
|
|
|
|
|
|
|
|
| |
This compiles with no changes to clang/lld/lldb with MSVC and includes
overloads to various functions which are used by those projects and llvm
which have OwningPtr's as parameters. This should allow out of tree
projects some time to move. There are also no changes to libs/Target,
which should help out of tree targets have time to move, if necessary.
llvm-svn: 203083
|
|
|
|
|
|
| |
Remove the old functions.
llvm-svn: 202636
|
|
|
|
| |
llvm-svn: 202621
|
|
|
|
|
|
|
| |
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.
llvm-svn: 200018
|
|
|
|
| |
llvm-svn: 199788
|
|
|
|
|
|
|
| |
Generalized the heuristic that looks at the (very rough) size of the
register file before enabling regpressure tracking.
llvm-svn: 199766
|
|
|
|
| |
llvm-svn: 198133
|
|
|
|
| |
llvm-svn: 198131
|
|
|
|
| |
llvm-svn: 198124
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PostGenericScheduler uses either the new machine model or the hazard
checker for top-down scheduling. Most of the infrastructure for PreRA
machine scheduling is reused.
With a some tuning, this should allow MachineScheduler to be default
for all ARM targets, including cortex-A9, using the new machine
model. Likewise, with additional tuning, it should be able to replace
PostRAScheduler for all targets.
The PostMachineScheduler pass does not currently run the
AntiDepBreaker. There is less need for it on targets that are already
running preRA MachineScheduler. I want to prove it's necessary before
committing to the maintenance burden.
The PostMachineScheduler also currently removes kill flags and adds
them all back later. This is a bit ridiculous. I'd prefer passes to
directly use a liveness utility than rely on flags.
A test case that enables this scheduler will be included in a
subsequent checkin that updates the A9 model.
llvm-svn: 198122
|
|
|
|
|
|
| |
Placeholder and boilerplate for a PostRA MachineScheduler pass.
llvm-svn: 198120
|
|
|
|
|
|
|
|
| |
Factor the MachineFunctionPass into MachineSchedulerBase.
Split the DAG class into ScheduleDAGMI and SchedulerDAGMILive.
llvm-svn: 198119
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These helper classes take care of the book-keeping the drives the
GenericScheduler heuristics. It is likely that developers writing
target-specific schedulers that work similarly to GenericScheduler
will want to use these helpers too. The immediate goal is to develop a
GenericPostScheduler that can run in place of the old PostRAScheduler,
but will use the new machine model.
No functionality change intended.
llvm-svn: 196643
|
|
|
|
| |
llvm-svn: 196585
|
|
|
|
|
|
|
|
|
|
| |
Not only does it trigger -Wparentheses, I think the assert actually
relies on incorrect operator precedence.
Also, the grammar as questionable, but I might not know enough about the
problem at hand.
llvm-svn: 196567
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows a target to use MI-Sched as an in-order scheduler that
will model strict resource conflicts without defining a processor
itinerary. Instead, the target can now use the new per-operand machine
model and define in-order resources with BufferSize=0. For example,
this would allow restricting the type of operations that can be formed
into a dispatch group. (Normally NumMicroOps is sufficient to enforce
dispatch groups).
If the intent is to model latency in in-order pipeline, as opposed to
resource conflicts, then a resource with BufferSize=1 should be
defined instead.
This feature is only casually tested as there are no in-tree targets
using it yet. However, Hal will be experimenting with POWER7.
llvm-svn: 196517
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.
MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.
I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false
For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)
Speedups:
| Benchmarks/BenchmarkGame/recursive | 52.39% |
| Benchmarks/VersaBench/beamformer | 20.80% |
| Benchmarks/Misc/pi | 19.97% |
| Benchmarks/Misc/mandel-2 | 19.95% |
| SPEC/CFP2000/188.ammp | 18.72% |
| Benchmarks/McCat/08-main/main | 18.58% |
| Benchmarks/Misc-C++/Large/sphereflake | 18.46% |
| Benchmarks/Olden/power | 17.11% |
| Benchmarks/Misc-C++/mandel-text | 16.47% |
| Benchmarks/Misc/oourafft | 15.94% |
| Benchmarks/Misc/flops-7 | 14.99% |
| Benchmarks/FreeBench/distray | 14.26% |
| SPEC/CFP2006/470.lbm | 14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode | 12.28% |
| Benchmarks/SmallPT/smallpt | 10.36% |
| Benchmarks/Misc-C++/Large/ray | 8.97% |
| Benchmarks/Misc/fp-convert | 8.75% |
| Benchmarks/Olden/perimeter | 7.10% |
| Benchmarks/Bullet/bullet | 7.03% |
| Benchmarks/Misc/mandel | 6.75% |
| Benchmarks/Olden/voronoi | 6.26% |
| Benchmarks/Misc/flops-8 | 5.77% |
| Benchmarks/Misc/matmul_f64_4x4 | 5.19% |
| Benchmarks/MiBench/security-rijndael | 5.15% |
| Benchmarks/Misc/flops-6 | 5.10% |
| Benchmarks/Olden/tsp | 4.46% |
| Benchmarks/MiBench/consumer-lame | 4.28% |
| Benchmarks/Misc/flops-5 | 4.27% |
| Benchmarks/mafft/pairlocalalign | 4.19% |
| Benchmarks/Misc/himenobmtxpa | 4.07% |
| Benchmarks/Misc/lowercase | 4.06% |
| SPEC/CFP2006/433.milc | 3.99% |
| Benchmarks/tramp3d-v4 | 3.79% |
| Benchmarks/FreeBench/pifft | 3.66% |
| Benchmarks/Ptrdist/ks | 3.21% |
| Benchmarks/Adobe-C++/loop_unroll | 3.12% |
| SPEC/CINT2000/175.vpr | 3.12% |
| Benchmarks/nbench | 2.98% |
| SPEC/CFP2000/183.equake | 2.91% |
| Benchmarks/Misc/perlin | 2.85% |
| Benchmarks/Misc/flops-1 | 2.82% |
| Benchmarks/Misc-C++-EH/spirit | 2.80% |
| Benchmarks/Misc/flops-2 | 2.77% |
| Benchmarks/NPB-serial/is | 2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk | 2.33% |
| Benchmarks/BenchmarkGame/n-body | 2.28% |
| Benchmarks/SciMark2-C/scimark2 | 2.27% |
| Benchmarks/Olden/bh | 2.03% |
| skidmarks10/skidmarks | 1.81% |
| Benchmarks/Misc/flops | 1.72% |
Slowdowns:
| Benchmarks/llubenchmark/llu | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d | -5.67% |
| Benchmarks/Adobe-C++/functionobjects | -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8 | -5.00% |
| Benchmarks/Shootout/hash | -2.35% |
| Benchmarks/Prolangs-C++/ocean | -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall | -1.98% |
| Polybench/linear-algebra/kernels/3mm | -1.95% |
| Benchmarks/McCat/09-vor/vor | -1.68% |
llvm-svn: 196516
|
|
|
|
| |
llvm-svn: 196513
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.
Differential Revision: http://llvm-reviews.chandlerc.com/D2068
Reviewed by Andy
llvm-svn: 195064
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
Base *foo = new Child();
delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.
llvm-svn: 194997
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file.
Differential Revision: http://llvm-reviews.chandlerc.com/D2068
Reviewed by Andy
llvm-svn: 194865
|
|
|
|
|
|
|
| |
This makes the API a bit more natural to use and makes it easier to make
LiveRanges implementation details private.
llvm-svn: 192394
|
|
|
|
| |
llvm-svn: 191312
|
|
|
|
|
|
|
|
|
|
|
| |
interface.
The global registry is used to allow command line override of the
scheduler selection, but does not work well as the normal selection
API. For example, the same LLVM process should be able to target
multiple targets or subtargets.
llvm-svn: 191071
|
|
|
|
|
|
|
|
|
|
| |
This was an experimental scheduler a year ago. It's now used by
several subtargets, both in-order and out-of-order, and it
is about to be enabled by default for x86 and armv7. It will be the
new GenericScheduler for subtargets that don't provide their own
SchedulingStrategy.
llvm-svn: 191051
|
|
|
|
| |
llvm-svn: 190367
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Arnold's idea.
I generally try to avoid stateful heuristics because it can make
debugging harder. However, we need a way to prevent the latency
priority from dominating, and it somewhat makes sense to schedule
aggressively for latency only within an issue group.
Swift in particular likes this, and it doesn't hurt anyone else:
| Benchmarks/MiBench/consumer-lame | 10.39% |
| Benchmarks/Misc/himenobmtxpa | 9.63% |
llvm-svn: 190360
|
|
|
|
| |
llvm-svn: 190181
|
|
|
|
| |
llvm-svn: 190180
|
|
|
|
| |
llvm-svn: 190179
|
|
|
|
| |
llvm-svn: 190178
|
|
|
|
|
|
| |
The latency based scheduling could induce spills in some cases.
llvm-svn: 190177
|
|
|
|
|
|
|
|
| |
Allow subtargets to customize the generic scheduling strategy.
This is convenient for targets that don't need to add new heuristics
by specializing the strategy.
llvm-svn: 190176
|
|
|
|
|
|
|
|
|
| |
Fast register pressure tracking currently only takes effect during
bottom up scheduling. Forcing this is a bit faster and simpler for
targets that don't have many scheduling constraints and don't need
top-down scheduling.
llvm-svn: 190014
|
|
|
|
| |
llvm-svn: 189997
|
|
|
|
| |
llvm-svn: 189995
|
|
|
|
| |
llvm-svn: 189994
|
|
|
|
| |
llvm-svn: 189993
|
|
|
|
| |
llvm-svn: 189992
|