summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Reenable use of TBAA during CodeGenHal Finkel2014-04-121-7/+1
| | | | | | | | | | | | | | | | | | | | We had disabled use of TBAA during CodeGen (even when otherwise using AA) because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to miss basic type punning that it should catch (and, thus, we'd fail to override TBAA when we should). However, when AA is in use during CodeGen, CGP now uses normal GEPs and bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a result, BasicAA should be able to make us do the right thing in the face of type-punning, and it seems safe to enable use of TBAA again. self-hosting seems fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle. Note: We still don't update TBAA when merging stack slots, although because BasicAA should now catch all such cases, this is no longer a blocking issue. Nevertheless, I plan to commit code to deal with this properly in the near future. llvm-svn: 206093
* Replace PROLOG_LABEL with a new CFI_INSTRUCTION.Rafael Espindola2014-03-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were created with the same label. The semantics were that the cfi instructions were mapped to the PROLOG_LABEL via the temporary label. The output position was that of the PROLOG_LABEL. The temporary label itself was used only for doing the mapping. The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to one by holding an index into the CFI instructions of this function. I did consider removing MMI.getFrameInstructions completelly and having CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non trivial constructors and destructors and are somewhat big, so the this setup is probably better. The net result is that we don't create temporary labels that are never used. llvm-svn: 203204
* [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.Benjamin Kramer2014-03-021-3/+4
| | | | | | Remove the old functions. llvm-svn: 202636
* Indent this continued line.Nick Lewycky2014-02-251-4/+4
| | | | llvm-svn: 202096
* Fix change in behaviour accidentally introduced in r201754.Nick Lewycky2014-02-201-2/+4
| | | | llvm-svn: 201758
* Simplify the implementation of getUnderlyingObjectsForInstr, without ↵Nick Lewycky2014-02-201-13/+12
| | | | | | intending to change the semantics at all. llvm-svn: 201754
* Disable the use of TBAA when using AA in CodeGenHal Finkel2014-01-251-2/+11
| | | | | | | | | | | | | | | | | There are currently two issues, of which I currently know, that prevent TBAA from being correctly usable in CodeGen: 1. Stack coloring does not update TBAA when merging allocas. This is easy enough to fix, but is not the largest problem. 2. CGP inserts ptrtoint/inttoptr pairs when sinking address computations. Because BasicAA does not handle inttoptr, we'll often miss basic type punning idioms that we need to catch so we don't miscompile real-world code (like LLVM). I don't yet have a small test case for this, but this fixes self hosting a non-asserts build of LLVM on PPC64 when using -enable-aa-sched-mi and -misched=shuffle. llvm-svn: 200093
* Track multiple stores per object when using AA in ScheduleDAGInstrsHal Finkel2014-01-201-21/+38
| | | | | | | | | | | | | | | | | | | When using AA to break false chain dependencies, we need to track multiple stores per object in ScheduleDAGInstrs. Historically, we tracked potential alias chains at the object level, and so all loads of an object would retain dependencies on any store to that object. With AA, however, this is not sufficient: non-overlapping stores and loads to the same object all need to be tested for dependencies separately, we cannot only test all loads to an object against only the last store (see PR18497 for an explicit example). To mitigate any unwelcome compile-time impact when not using AA, only one store is kept in the list per object when not using AA. This, along with a stack coloring change to come shortly, will provide a test case, fix PR18497 (and allow LLVM to compile itself using -enable-aa-sched-mi on x86-64). llvm-svn: 199657
* Conservatively handle multiple MMOs in MIsNeedChainEdgeHal Finkel2014-01-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | MIsNeedChainEdge, which is used by -enable-aa-sched-mi (AA in misched), had an llvm_unreachable when -enable-aa-sched-mi is enabled and we reach an instruction with multiple MMOs. Instead, return a conservative answer. This allows testing -enable-aa-sched-mi on x86. Also, this moves the check above the isUnsafeMemoryObject checks. isUnsafeMemoryObject is currently correct only for instructions with one MMO (as noted in the comment in isUnsafeMemoryObject): // We purposefully do no check for hasOneMemOperand() here // in hope to trigger an assert downstream in order to // finish implementation. The problem with this is that, had the candidate edge passed the "!MIa->mayStore() && !MIb->mayStore()" check, the hoped-for assert would never happen (which could, in theory, lead to incorrect behavior if one of these secondary MMOs was volatile, for example). llvm-svn: 198795
* Move the PostRA scheduler's fixupKills function for reuse.Andrew Trick2013-12-281-3/+147
| | | | llvm-svn: 198121
* MI-Sched: Model "reserved" processor resources.Andrew Trick2013-12-051-1/+7
| | | | | | | | | | | | | | | | | | | This allows a target to use MI-Sched as an in-order scheduler that will model strict resource conflicts without defining a processor itinerary. Instead, the target can now use the new per-operand machine model and define in-order resources with BufferSize=0. For example, this would allow restricting the type of operations that can be formed into a dispatch group. (Normally NumMicroOps is sufficient to enforce dispatch groups). If the intent is to model latency in in-order pipeline, as opposed to resource conflicts, then a resource with BufferSize=1 should be defined instead. This feature is only casually tested as there are no in-tree targets using it yet. However, Hal will be experimenting with POWER7. llvm-svn: 196517
* MI-Sched: handle latency of in-order operations with the new machine model.Andrew Trick2013-12-051-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The per-operand machine model allows the target to define "unbuffered" processor resources. This change is a quick, cheap way to model stalls caused by the latency of operations that use such resources. This only applies when the processor's micro-op buffer size is non-zero (Out-of-Order). We can't precisely model in-order stalls during out-of-order execution, but this is an easy and effective heuristic. It benefits cortex-a9 scheduling when using the new machine model, which is not yet on by default. MI-Sched for armv7 was evaluated on Swift (and only not enabled because of a performance bug related to predication). However, we never evaluated Cortex-A9 performance on MI-Sched in its current form. This change adds MI-Sched functionality to reach performance goals on A9. The only remaining change is to allow MI-Sched to run as a PostRA pass. I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7: -mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results: (min run time over 2 runs, filtering tiny changes) Speedups: | Benchmarks/BenchmarkGame/recursive | 52.39% | | Benchmarks/VersaBench/beamformer | 20.80% | | Benchmarks/Misc/pi | 19.97% | | Benchmarks/Misc/mandel-2 | 19.95% | | SPEC/CFP2000/188.ammp | 18.72% | | Benchmarks/McCat/08-main/main | 18.58% | | Benchmarks/Misc-C++/Large/sphereflake | 18.46% | | Benchmarks/Olden/power | 17.11% | | Benchmarks/Misc-C++/mandel-text | 16.47% | | Benchmarks/Misc/oourafft | 15.94% | | Benchmarks/Misc/flops-7 | 14.99% | | Benchmarks/FreeBench/distray | 14.26% | | SPEC/CFP2006/470.lbm | 14.00% | | mediabench/mpeg2/mpeg2dec/mpeg2decode | 12.28% | | Benchmarks/SmallPT/smallpt | 10.36% | | Benchmarks/Misc-C++/Large/ray | 8.97% | | Benchmarks/Misc/fp-convert | 8.75% | | Benchmarks/Olden/perimeter | 7.10% | | Benchmarks/Bullet/bullet | 7.03% | | Benchmarks/Misc/mandel | 6.75% | | Benchmarks/Olden/voronoi | 6.26% | | Benchmarks/Misc/flops-8 | 5.77% | | Benchmarks/Misc/matmul_f64_4x4 | 5.19% | | Benchmarks/MiBench/security-rijndael | 5.15% | | Benchmarks/Misc/flops-6 | 5.10% | | Benchmarks/Olden/tsp | 4.46% | | Benchmarks/MiBench/consumer-lame | 4.28% | | Benchmarks/Misc/flops-5 | 4.27% | | Benchmarks/mafft/pairlocalalign | 4.19% | | Benchmarks/Misc/himenobmtxpa | 4.07% | | Benchmarks/Misc/lowercase | 4.06% | | SPEC/CFP2006/433.milc | 3.99% | | Benchmarks/tramp3d-v4 | 3.79% | | Benchmarks/FreeBench/pifft | 3.66% | | Benchmarks/Ptrdist/ks | 3.21% | | Benchmarks/Adobe-C++/loop_unroll | 3.12% | | SPEC/CINT2000/175.vpr | 3.12% | | Benchmarks/nbench | 2.98% | | SPEC/CFP2000/183.equake | 2.91% | | Benchmarks/Misc/perlin | 2.85% | | Benchmarks/Misc/flops-1 | 2.82% | | Benchmarks/Misc-C++-EH/spirit | 2.80% | | Benchmarks/Misc/flops-2 | 2.77% | | Benchmarks/NPB-serial/is | 2.42% | | Benchmarks/ASC_Sequoia/CrystalMk | 2.33% | | Benchmarks/BenchmarkGame/n-body | 2.28% | | Benchmarks/SciMark2-C/scimark2 | 2.27% | | Benchmarks/Olden/bh | 2.03% | | skidmarks10/skidmarks | 1.81% | | Benchmarks/Misc/flops | 1.72% | Slowdowns: | Benchmarks/llubenchmark/llu | -14.14% | | Benchmarks/Polybench/stencils/seidel-2d | -5.67% | | Benchmarks/Adobe-C++/functionobjects | -5.25% | | Benchmarks/Misc-C++/oopack_v1p8 | -5.00% | | Benchmarks/Shootout/hash | -2.35% | | Benchmarks/Prolangs-C++/ocean | -2.01% | | Benchmarks/Polybench/medley/floyd-warshall | -1.98% | | Polybench/linear-algebra/kernels/3mm | -1.95% | | Benchmarks/McCat/09-vor/vor | -1.68% | llvm-svn: 196516
* Correct word hyphenationsAlp Toker2013-12-051-1/+1
| | | | | | | This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471
* Pass LiveQueryResult by valueMatthias Braun2013-10-101-1/+2
| | | | | | | This makes the API a bit more natural to use and makes it easier to make LiveRanges implementation details private. llvm-svn: 192394
* mi-sched: bypass heuristic checks when regpressure tracking is disabled.Andrew Trick2013-09-041-3/+3
| | | | llvm-svn: 189988
* mi-sched: update PressureDiffs on-the-fly for liveness.Andrew Trick2013-08-301-2/+8
| | | | | | | This removes all expensive pressure tracking logic from the scheduling critical path of node comparison. llvm-svn: 189643
* mi-sched: Precompute a PressureDiff for each instruction, adjust for ↵Andrew Trick2013-08-301-5/+10
| | | | | | | | | | | | | liveness later. Created SUPressureDiffs array to hold the per node PDiff computed during DAG building. Added a getUpwardPressureDelta API that will soon replace the old one. Compute PressureDelta here from the precomputed PressureDiffs. Updating for liveness will come next. llvm-svn: 189640
* Comment and revise the cyclic critical path code.Andrew Trick2013-08-291-59/+0
| | | | | | This should be much more clear now. It's still disabled pending testing. llvm-svn: 189597
* Add useAA() to TargetSubtargetInfoHal Finkel2013-08-291-13/+19
| | | | | | | | | | | | | There are several optional (off-by-default) features in CodeGen that can make use of alias analysis. These features are important for generating code for some kinds of cores (for example the (in-order) PPC A2 core). This adds a useAA() function to TargetSubtargetInfo to allow these features to be enabled by default on a per-subtarget basis. Here is the first use of this function: To control the default of the -enable-aa-sched-mi feature. llvm-svn: 189563
* Adds cyclic critical path computation and heuristics, temporarily disabled.Andrew Trick2013-08-231-0/+61
| | | | | | | | | | | Estimate the cyclic critical path within a single block loop. If the acyclic critical path is longer, then the loop will exhaust OOO resources after some number of iterations. If lag between the acyclic critical path and cyclic critical path is longer the the time it takes to issue those loop iterations, then aggressively schedule for latency. llvm-svn: 189120
* MI Sched: record local vreg uses.Andrew Trick2013-08-231-3/+5
| | | | | | | | | This will be used to compute the cyclic critical path and to update precomputed per-node pressure differences. In the longer term, it could also be used to speed up LiveInterval update by avoiding visiting all global vreg users. llvm-svn: 189118
* mi-sched: Don't call MBB.size() in initSUnits. The driver already has instr ↵Andrew Trick2013-08-231-3/+3
| | | | | | | | | count. This fixes a pathological compile time problem with very large blocks and lots of scheduling boundaries. llvm-svn: 189116
* Use SmallVectorImpl instead of SmallVector for iterators and references to ↵Craig Topper2013-07-031-5/+5
| | | | | | avoid specifying the vector size unnecessarily. llvm-svn: 185512
* misched: Compress pairs returned by getUnderlyingObjectsForInstr.Benjamin Kramer2013-06-291-12/+15
| | | | llvm-svn: 185266
* Machine Model: Add MicroOpBufferSize and resource BufferSize.Andrew Trick2013-06-151-17/+7
| | | | | | | | | | | | | Replace the ill-defined MinLatency and ILPWindow properties with with straightforward buffer sizes: MCSchedMode::MicroOpBufferSize MCProcResourceDesc::BufferSize These can be used to more precisely model instruction execution if desired. Disabled some misched tests temporarily. They'll be reenabled in a few commits. llvm-svn: 184032
* MI-Sched: schedule physreg copies.Andrew Trick2013-04-131-0/+4
| | | | | | | | | | | The register allocator expects minimal physreg live ranges. Schedule physreg copies accordingly. This is slightly tricky when they occur in the middle of the scheduling region. For now, this is handled by rescheduling the copy when its associated instruction is scheduled. Eventually we may instead bundle them, but only if we can preserve the bundles as parallel copies during regalloc. llvm-svn: 179449
* Equal treatment of labels and other terminators in MI DAG construction.Sergei Larin2013-02-121-1/+1
| | | | | | | MI sched DAG construction allows targets to include terminators into scheduling DAG. Extend this functionality to labels as well. llvm-svn: 174977
* ScheduleDAG: colorize the DOT graph and improve formatting.Andrew Trick2013-01-251-1/+1
| | | | llvm-svn: 173431
* ScheduleDAG: Added isBoundaryNode to conveniently detect a common corner case.Andrew Trick2013-01-251-7/+19
| | | | | | This fixes DAG subtree analysis at the boundary. llvm-svn: 173427
* SchedDFS: Complete support for nested subtrees.Andrew Trick2013-01-251-33/+74
| | | | | | | | | Maintain separate per-node and per-tree book-keeping. Track all instructions above a DAG node including nested subtrees. Seperately track instructions within a subtree. Record subtree parents. llvm-svn: 173426
* MIsched: Improve the interface to SchedDFS analysis (subtrees).Andrew Trick2013-01-251-5/+9
| | | | | | | Allow the strategy to select SchedDFS. Allow the results of SchedDFS to affect initialization of the scheduler state. llvm-svn: 173425
* SchedDFS: Initial support for nested subtrees.Andrew Trick2013-01-251-37/+73
| | | | | | | This is mostly refactoring, along with adding an instruction count within the subtrees and ensuring we only look at data edges. llvm-svn: 173420
* SchedDFS: Refactor and tweak the subtree selection criteria.Andrew Trick2013-01-251-24/+32
| | | | | | | | | | For sanity, create a root when NumDataSuccs >= 4. Splitting large subtrees will no longer be detrimental after my next checkin to handle nested tree. A magic number of 4 is fine because single subtrees seldom rejoin more than this. It makes subtrees easier to visualize and heuristics more sane. llvm-svn: 173399
* Introduce a new data structure, the SparseMultiSet, and changes to the MI ↵Michael Ilseman2013-01-211-45/+33
| | | | | | | | scheduler to use it. A SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's desirable properties. Essentially, SparseMultiSet provides multiset behavior by storing its dense data in doubly linked lists that are inlined into the dense vector. This allows it to provide good data locality as well as vector-like constant-time clear() and fast constant time find(), insert(), and erase(). It also allows SparseMultiSet to have a builtin recycler rather than keeping SparseSet's behavior of always swapping upon removal, which allows it to preserve more iterators. It's often a better alternative to a SparseSet of a growable container or vector-of-vector. llvm-svn: 173064
* Move all of the header files which are involved in modelling the LLVM IRChandler Carruth2013-01-021-1/+1
| | | | | | | | | | | | | | | | | | | | | into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM. There are still more header files to move here, but I wanted to handle them in separate commits to make tracking what files make sense at each layer easier. The only really questionable files here are the target intrinsic tablegen files. But that's a battle I'd rather not fight today. I've updated both CMake and Makefile build systems (I think, and my tests think, but I may have missed something). I've also re-sorted the includes throughout the project. I'll be committing updates to Clang, DragonEgg, and Polly momentarily. llvm-svn: 171366
* MISched: add dependence to ExitSU to model live-out latency.Andrew Trick2012-12-181-1/+16
| | | | llvm-svn: 170454
* Use GetUnderlyingObjects in mischedHal Finkel2012-12-101-92/+143
| | | | | | | | | | | | | | | | misched used GetUnderlyingObject in order to break false load/store dependencies, and the -enable-aa-sched-mi feature similarly relied on GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis. Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so (especially due to LSR) all of these mechanisms failed for induction-variable-dependent loads and stores inside loops. This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects (which will recurse through phi and select instructions) in misched. Andy reviewed, tested and simplified this patch; Thanks! llvm-svn: 169744
* Use the new script to sort the includes of every file under lib.Chandler Carruth2012-12-031-9/+9
| | | | | | | | | | | | | | | | | Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131
* misched: Fix RegisterPressureTracker handling of DebugVals.Andrew Trick2012-12-011-7/+7
| | | | | | | Assertion failed: (TopRPTracker.getPos() == RegionBegin && "bad initial Top tracker"). rdar://12790302. llvm-svn: 169072
* misched: Fix the DAG builder to handle an undef operand at ExitSU.Andrew Trick2012-12-011-1/+2
| | | | | | | Assertion failed: (VNI && "No value to read by operand") rdar://12790267. llvm-svn: 169071
* misched: Analysis that partitions the DAG into subtrees.Andrew Trick2012-11-281-41/+166
| | | | | | | | | | | This is a simple, cheap infrastructure for analyzing the shape of a DAG. It recognizes uniform DAGs that take the shape of bottom-up subtrees, such as the included matrix multiplication example. This is useful for heuristics that balance register pressure with ILP. Two canonical expressions of the heuristic are implemented in scheduling modes: -misched-ilpmin and -misched-ilpmax. llvm-svn: 168773
* misched: rename ScheduleDAGILP to ScheduleDFS to prepare for other heuristics.Andrew Trick2012-11-281-1/+1
| | | | llvm-svn: 168772
* misched: better alias analysis.Andrew Trick2012-11-281-2/+3
| | | | | | | | | | | | | This fixes a hole in the "cheap" alias analysis logic implemented within the DAG builder itself, regardless of whether proper alias analysis is enabled. It now handles this pattern produced by LSR+CodeGenPrepare. %sunkaddr1 = ptrtoint * %obj to i64 %sunkaddr2 = add i64 %sunkaddr1, %lsr.iv %sunkaddr3 = inttoptr i64 %sunkaddr2 to i32* store i32 %v, i32* %sunkaddr3 llvm-svn: 168768
* Fix indeterminism in MI scheduler DAG construction.Sergei Larin2012-11-151-15/+15
| | | | | | | Similarly to several recent fixes throughout the code replace std::map use with the MapVector. Add find() method to the MapVector. llvm-svn: 168051
* misched: Infrastructure for weak DAG edges.Andrew Trick2012-11-121-9/+14
| | | | | | | | This adds support for weak DAG edges to the general scheduling infrastructure in preparation for MachineScheduler support for heuristics based on weak edges. llvm-svn: 167738
* ScheduleDAG interface. Added OrderKind to distinguish nonregister dependencies.Andrew Trick2012-11-061-25/+32
| | | | | | | This is in preparation for adding "weak" DAG edges, but generally simplifies the design. llvm-svn: 167435
* [inline asm] Implement mayLoad and mayStore for inline assembly. In general,Chad Rosier2012-10-301-5/+0
| | | | | | | | | | the MachineInstr MayLoad/MayLoad flags are based on the tablegen implementation. For inline assembly, however, we need to compute these based on the constraints. Revert r166929 as this is no longer needed, but leave the test case in place. rdar://12033048 and PR13504 llvm-svn: 167040
* This patch addresses a problem with the Post RA scheduler generating anPreston Gurd2012-10-291-0/+5
| | | | | | | | | | | | | | incorrect instruction sequence due to it not being aware that an inline assembly instruction may reference memory. This patch fixes the problem by causing the scheduler to always assume that any inline assembly code instruction could access memory. This is necessary because the internal representation of the inline instruction does not include any information about memory accesses. This should fix PR13504. llvm-svn: 166929
* Fix typo in comment.Nick Lewycky2012-10-261-1/+1
| | | | llvm-svn: 166750
* misched: ILP scheduler for experimental heuristics.Andrew Trick2012-10-151-0/+93
| | | | llvm-svn: 165950
OpenPOWER on IntegriCloud