summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Improve pre-RA-sched register pressure tracking for duplicate operands.Andrew Trick2011-03-091-1/+5
| | | | | | This helps cases like 2008-07-19-movups-spills.ll, but doesn't have an obvious impact on benchmarks llvm-svn: 127347
* Fix some latent bugs if the nodes are unschedulable. We'd gotten awayEric Christopher2011-03-081-0/+4
| | | | | | | | | | | | with this before since none of the register tracking or nightly tests had unschedulable nodes. This should probably be refixed with a special default Node that just returns some "don't touch me" values. Fixes PR9427 llvm-svn: 127263
* Fix for -sched-high-latency-cycles in sched=list-ilp mode.Andrew Trick2011-03-051-1/+3
| | | | llvm-svn: 127071
* Increased the register pressure limit on x86_64 from 8 to 12Andrew Trick2011-03-051-1/+13
| | | | | | | | | | | | | | | | | | | | | | | regs. This is the only change in this checkin that may affects the default scheduler. With better register tracking and heuristics, it doesn't make sense to artificially lower the register limit so much. Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to give the scheduler a way to account for div and sqrt on targets that don't have an itinerary. It is currently defaults to 10 (the actual number doesn't matter much), but only takes effect on non-default schedulers: list-hybrid and list-ilp. Added several heuristics that can be individually disabled for the non-default sched=list-ilp mode. This helps us determine how much better we can do on a given benchmark than the default scheduler. Certain compute intensive loops run much faster in this mode with the right set of heuristics, and it doesn't seem to have much negative impact elsewhere. Not all of the heuristics are needed, but we still need to experiment to decide which should be disabled by default for sched=list-ilp. llvm-svn: 127067
* Introducing a new method of tracking register pressure. We can'tAndrew Trick2011-02-041-1/+73
| | | | | | | | | | | | | | | precisely track pressure on a selection DAG, but we can at least keep it balanced. This design accounts for various interesting aspects of selection DAGS: register and subregister copies, glued nodes, dead nodes, unused registers, etc. Added SUnit::NumRegDefsLeft and ScheduleDAGSDNodes::RegDefIter. Note: I disabled PrescheduleNodesWithMultipleUses when register pressure is enabled, based on no evidence other than I don't think it makes sense to have both enabled. llvm-svn: 124853
* whitespaceAndrew Trick2011-02-031-18/+18
| | | | llvm-svn: 124827
* Reapply 124301Devang Patel2011-01-271-1/+5
| | | | llvm-svn: 124339
* Revert 124301.Devang Patel2011-01-261-5/+1
| | | | llvm-svn: 124327
* Process valid SDDbgValues even if the node does not have any order assigned.Devang Patel2011-01-261-1/+5
| | | | llvm-svn: 124301
* Refactor.Devang Patel2011-01-261-19/+30
| | | | llvm-svn: 124300
* This assertion is too restrictive, it does not apply for dangling dbg value ↵Devang Patel2011-01-251-8/+0
| | | | | | nodes (nodes where dbg.value intrinsic preceds use of the value). llvm-svn: 124202
* flags -> glue for selectiondagChris Lattner2010-12-231-49/+48
| | | | llvm-svn: 122509
* rename MVT::Flag to MVT::Glue. "Flag" is a terrible name forChris Lattner2010-12-211-7/+7
| | | | | | | something that just glues two nodes together, even if it is sometimes used for flags. llvm-svn: 122310
* Two sets of changes. Sorry they are intermingled.Evan Cheng2010-11-031-4/+7
| | | | | | | | | | | | | 1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to "optimize for latency". Call instructions don't have the right latency and this is more likely to use introduce spills. 2. Fix if-converter cost function. For ARM, it should use instruction latencies, not # of micro-ops since multi-latency instructions is completely executed even when the predicate is false. Also, some instruction will be "slower" when they are predicated due to the register def becoming implicit input. rdar://8598427 llvm-svn: 118135
* Avoiding overly aggressive latency scheduling. If the two nodes share anEvan Cheng2010-10-291-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | operand and one of them has a single use that is a live out copy, favor the one that is live out. Otherwise it will be difficult to eliminate the copy if the instruction is a loop induction variable update. e.g. BB: sub r1, r3, #1 str r0, [r2, r3] mov r3, r1 cmp bne BB => BB: str r0, [r2, r3] sub r3, r3, #1 cmp bne BB This fixed the recent 256.bzip2 regression. llvm-svn: 117675
* Re-commit 117518 and 117519 now that ARM MC test failures are out of the way.Evan Cheng2010-10-281-0/+3
| | | | llvm-svn: 117531
* Revert 117518 and 117519 for now. They changed scheduling and cause MC tests ↵Evan Cheng2010-10-281-3/+0
| | | | | | to fail. Ugh. llvm-svn: 117520
* Fix a major bug in operand latency computation. The use index must be adjustedEvan Cheng2010-10-281-0/+3
| | | | | | by the number of defs first for it to match the instruction itinerary. llvm-svn: 117518
* - Add TargetInstrInfo::getOperandLatency() to compute operand latencies. ThisEvan Cheng2010-10-061-19/+1
| | | | | | | | | | | | | allow target to correctly compute latency for cases where static scheduling itineraries isn't sufficient. e.g. variable_ops instructions such as ARM::ldm. This also allows target without scheduling itineraries to compute operand latencies. e.g. X86 can return (approximated) latencies for high latency instructions such as division. - Compute operand latencies for those defined by load multiple instructions, e.g. ldm and those used by store multiple instructions, e.g. stm. llvm-svn: 115755
* Model Cortex-a9 load to SUB, RSB, ADD, ADC, SBC, RSC, CMN, MVN, or CMPEvan Cheng2010-09-291-17/+17
| | | | | | pipeline forwarding path. llvm-svn: 115098
* Teach if-converter to be more careful with predicating instructions that wouldEvan Cheng2010-09-101-9/+7
| | | | | | | | | | | take multiple cycles to decode. For the current if-converter clients (actually only ARM), the instructions that are predicated on false are not nops. They would still take machine cycles to decode. Micro-coded instructions such as LDM / STM can potentially take multiple cycles to decode. If-converter should take treat them as non-micro-coded simple instructions. llvm-svn: 113570
* Add missing null check reported by Amaury Pouly.Evan Cheng2010-08-101-2/+3
| | | | llvm-svn: 110649
* Fix a bug in the code which re-inserts DBG_VALUE nodes after scheduling;Dan Gohman2010-07-101-1/+3
| | | | | | | | if a block is split (by a custom inserter), the insert point may be in a different block than it was originally. This fixes 32-bit llvm-gcc bootstrap builds, and I haven't been able to reproduce it otherwise. llvm-svn: 108060
* Reapply bottom-up fast-isel, with several fixes for x86-32:Dan Gohman2010-07-101-8/+5
| | | | | | | | | - Check getBytesToPopOnReturn(). - Eschew ST0 and ST1 for return values. - Fix the PIC base register initialization so that it doesn't ever fail to end up the top of the entry block. llvm-svn: 108039
* --- Reverse-merging r107947 into '.':Bob Wilson2010-07-091-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | U utils/TableGen/FastISelEmitter.cpp --- Reverse-merging r107943 into '.': U test/CodeGen/X86/fast-isel.ll U test/CodeGen/X86/fast-isel-loads.ll U include/llvm/Target/TargetLowering.h U include/llvm/Support/PassNameParser.h U include/llvm/CodeGen/FunctionLoweringInfo.h U include/llvm/CodeGen/CallingConvLower.h U include/llvm/CodeGen/FastISel.h U include/llvm/CodeGen/SelectionDAGISel.h U lib/CodeGen/LLVMTargetMachine.cpp U lib/CodeGen/CallingConvLower.cpp U lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp U lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp U lib/CodeGen/SelectionDAG/FastISel.cpp U lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp U lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp U lib/CodeGen/SelectionDAG/InstrEmitter.cpp U lib/CodeGen/SelectionDAG/TargetLowering.cpp U lib/Target/XCore/XCoreISelLowering.cpp U lib/Target/XCore/XCoreISelLowering.h U lib/Target/X86/X86ISelLowering.cpp U lib/Target/X86/X86FastISel.cpp U lib/Target/X86/X86ISelLowering.h llvm-svn: 107987
* Re-apply bottom-up fast-isel, with fixes. Be very careful to avoid emittingDan Gohman2010-07-091-8/+5
| | | | | | a DBG_VALUE after a terminator, or emitting any instructions before an EH_LABEL. llvm-svn: 107943
* grammar tweak in comment.Jim Grosbach2010-06-301-1/+1
| | | | llvm-svn: 107321
* Add a VT argument to getMinimalPhysRegClass and replace the copy related usesRafael Espindola2010-06-291-1/+1
| | | | | | | | | of getPhysicalRegisterRegClass with it. If we want to make a copy (or estimate its cost), it is better to use the smallest class as more efficient operations might be possible. llvm-svn: 107140
* Remove variables which are assigned to but for which the valueDuncan Sands2010-06-251-8/+1
| | | | | | is not used. Spotted by gcc-4.6. llvm-svn: 106854
* It's possible that a flag is added to the SDNode that points back to theBill Wendling2010-06-241-11/+19
| | | | | | | original SDNode. This is badness. Also, this function allows one SDNode to point multiple flags to another SDNode. Badness as well. llvm-svn: 106793
* MorphNodeTo doesn't preserve the memory operands. Because we're morphing a nodeBill Wendling2010-06-231-0/+21
| | | | | | | into the same node, but with different non-memory operands, we need to replace the memory operands after it's finished morphing. llvm-svn: 106643
* Use A.append(...) instead of A.insert(A.end(), ...) when A is aDan Gohman2010-06-211-1/+1
| | | | | | SmallVector, and other SmallVector simplifications. llvm-svn: 106452
* Code refactoring, no functionality changes.Evan Cheng2010-06-101-82/+83
| | | | llvm-svn: 105775
* Fix some latency computation bugs: if the use is not a machine opcode do not ↵Evan Cheng2010-05-281-6/+15
| | | | | | just return zero. llvm-svn: 105061
* Allow targets more controls on what nodes are scheduled by reg pressure, ↵Evan Cheng2010-05-201-0/+20
| | | | | | what for latency in hybrid mode. llvm-svn: 104293
* Add a hybrid bottom up scheduler that reduce register usage while avoidingEvan Cheng2010-05-201-1/+36
| | | | | | | | | pipeline stall. It's useful for targets like ARM cortex-a8. NEON has a lot of long latency instructions so a strict register pressure reduction scheduler does not work well. Early experiments show this speeds up some NEON loops by over 30%. llvm-svn: 104216
* Code clean up.Evan Cheng2010-05-191-7/+7
| | | | llvm-svn: 104173
* Get rid of the EdgeMapping map. Instead, just check for BasicBlockDan Gohman2010-05-011-4/+3
| | | | | | changes before doing phi lowering for switches. llvm-svn: 102809
* EmitDbgValue doesn't need its EdgeMapping argument.Dan Gohman2010-04-301-7/+6
| | | | llvm-svn: 102742
* Add DBG_VALUE handling for byval parameters; thisDale Johannesen2010-04-261-3/+14
| | | | | | | produces a comment on targets that support it, but the Dwarf writer is not hooked up yet. llvm-svn: 102372
* - Move TargetLowering::EmitTargetCodeForFrameDebugValue to TargetInstrInfo ↵Evan Cheng2010-04-261-12/+17
| | | | | | | | and rename it to emitFrameIndexDebugValue. - Teach spiller to modify DBG_VALUE instructions to reference spill slots. llvm-svn: 102323
* Fix -Wcast-qual warnings.Dan Gohman2010-04-171-2/+2
| | | | llvm-svn: 101655
* Scheduler assumes SDDbgValue nodes are in source order. That's true ↵Evan Cheng2010-03-251-0/+8
| | | | | | currently. But add an assertion to verify it. llvm-svn: 99501
* Remove a fixme that doesn't make sense any more.Evan Cheng2010-03-251-2/+0
| | | | llvm-svn: 99489
* Change how dbg_value sdnodes are converted into machine instructions. Their ↵Evan Cheng2010-03-251-16/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | placement should be determined by the relative order of incoming llvm instructions. The scheduler will now use the SDNode ordering information to determine where to insert them. A dbg_value instruction is inserted after the instruction with the last highest source order and before the instruction with the next highest source order. It will optimize the placement by inserting right after the instruction that produces the value if they have consecutive order numbers. Here is a theoretical example that illustrates why the placement is important. tmp1 = store tmp1 -> x ... tmp2 = add ... ... call ... store tmp2 -> x Now mem2reg comes along: tmp1 = dbg_value (tmp1 -> x) ... tmp2 = add ... ... call ... dbg_value (tmp2 -> x) When the debugger examine the value of x after the add instruction but before the call, it should have the value of tmp1. Furthermore, for dbg_value's that reference constants, they should not be emitted at the beginning of the block (since they do not have "producers"). This patch also cleans up how SDISel manages DbgValue nodes. It allow a SDNode to be referenced by multiple SDDbgValue nodes. When a SDNode is deleted, it uses the information to find the SDDbgValues and invalidate them. They are not deleted until the corresponding SelectionDAG is destroyed. llvm-svn: 99469
* Rename SDDbgValue.h to SDNodeDbgValue.h for consistency.Evan Cheng2010-03-141-1/+1
| | | | llvm-svn: 98513
* Progress towards shepherding debug info through SelectionDAG.Dale Johannesen2010-03-101-0/+19
| | | | | | | No functional effect yet. This is still evolving and should not be viewed as final. llvm-svn: 98195
* Change the scheduler from adding nodes in allnodes orderChris Lattner2010-02-241-2/+14
| | | | | | | | | | | | | | | | | | to adding them in a determinstic order (bottom up from the root) based on the structure of the graph itself. This updates tests for some random changes, interesting bits: CodeGen/Blackfin/promote-logic.ll no longer crashes. I have no idea why, but that's good right? CodeGen/X86/2009-07-16-LoadFoldingBug.ll also fails, but now compiles to have one fewer constant pool entry, making the expected load that was being folded disappear. Since it is an unreduced mass of gnast, I just removed it. This fixes PR6370 llvm-svn: 97023
* Enable pre-regalloc scheduling load clustering by default.Evan Cheng2010-01-221-7/+1
| | | | llvm-svn: 94255
* Teach pre-regalloc scheduler to schedule loads from nearby addresses. It may ↵Evan Cheng2010-01-221-0/+130
| | | | | | improve cache locality. This is controlled by -cluster-loads for now. llvm-svn: 94148
OpenPOWER on IntegriCloud