summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/MachineCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Fix several places that weren't passing what they though they were to ↵Craig Topper2019-06-021-2/+4
| | | | | | | | | | MachineInstr::print Over a year ago, MachineInstr gained a fourth boolean parameter that occurs before the TII pointer. When this happened, several places started accidentally passing TII into this boolean parameter instead of the TII parameter. llvm-svn: 362312
* [IR] Refactor attribute methods in Function class (NFC)Evandro Menezes2019-04-041-1/+1
| | | | | | | | Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731
* [AsmPrinter] Remove hidden flag -print-schedule.Andrea Di Biagio2019-02-041-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [MachineCombiner][NFC] Prevent dereferencing past-the-end object in an MRI ↵Gerolf Hoflehner2019-01-101-0/+2
| | | | | | container llvm-svn: 350896
* Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-141-24/+27
| | | | | | | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
* [TargetSchedule] shrink interface for init(); NFCISanjay Patel2018-04-081-1/+1
| | | | | | | | | | The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540
* Revert r327721 "This patch fixes the invalid usage of OptSize in Machine ↵Reid Kleckner2018-03-161-3/+3
| | | | | | | | | Combiner." It causes asserts when compiling Chromium on Win32 with optimizations. We compile many things with -Os. llvm-svn: 327733
* This patch fixes the invalid usage of OptSize in Machine Combiner.Andrew V. Tischenko2018-03-161-3/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D43813 llvm-svn: 327721
* The final step to close D41278 [MachineCombiner] Improve debug output (NFC).Andrew V. Tischenko2018-02-261-4/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D41278 llvm-svn: 326074
* (NFC)[MachineCombiner] Improve debug output.Andrew V. Tischenko2018-02-151-28/+53
| | | | llvm-svn: 325217
* Fix unused variable warning in release mode. NFC.Alexander Ivchenko2018-02-061-0/+1
| | | | llvm-svn: 324330
* [MachineCombiner] Add check for optimal pattern order.Florian Hahn2018-01-311-16/+82
| | | | | | | | | | | | | | | | | | | | | | In D41587, @mssimpso discovered that the order of some patterns for AArch64 was sub-optimal. I thought a bit about how we could avoid that case in the future. I do not think there is a need for evaluating all patterns for now. But this patch adds an extra (expensive) check, that evaluates the latencies of all patterns, and ensures that the latency saved decreases for subsequent patterns. This catches the sub-optimal order fixed in D41587, but I am not entirely happy with the check, as it only applies to sub-optimal patterns seen while building with EXPENSIVE_CHECKS on. It did not discover any other sub-optimal pattern ordering. Reviewers: Gerolf, spatel, mssimpso Reviewed By: Gerolf, mssimpso Differential Revision: https://reviews.llvm.org/D41766 llvm-svn: 323873
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-1/+1
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* Remove redundant includes from lib/CodeGen.Michael Zolotukhin2017-12-131-1/+0
| | | | llvm-svn: 320619
* [MachineCombiner] Add up latencies of all instructions in new pattern.Florian Hahn2017-12-061-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we only add the latency of the new root instructions, ignoring the latencies of the other instructions inserted. This leads the combiner to underestimate the cost of patterns which add multiple instructions. This patch fixes that by summing up the latencies of all new instructions. For NewRootNode, the more complex getLatency function is used. Note that we may be slightly more precise than just summing up all latencies. For example, consider a pattern like r1 = INS1 .. r2 = INS2 .. r3 = INS3 r1, r2 I think in some other places, the total latency of the pattern would be estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider that worth changing, I think it would be best to do in a follow-up patch. Reviewers: Gerolf, sebpop, spop, fhahn Reviewed By: fhahn Subscribers: evandro, llvm-commits Differential Revision: https://reviews.llvm.org/D40307 llvm-svn: 319951
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-2/+2
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layeringDavid Blaikie2017-11-081-1/+1
| | | | | | | | This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647
* [MC] Split out register def/use idx calls to make debugging simpler. NFCI.Simon Pilgrim2017-10-301-3/+4
| | | | llvm-svn: 316927
* [MachineCombiner] Fix initialisation of LastUpdate for incremental update.Florian Hahn2017-10-111-2/+4
| | | | | | | | | | | | | | | | | Summary: Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental update is enabled. Patch by Paul Walker. Reviewers: fhahn, Gerolf, efriedma, MatzeB Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38734 llvm-svn: 315502
* Recommit [MachineCombiner] Update instruction depths incrementally for large ↵Florian Hahn2017-09-201-23/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BBs. This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when calling updateDepths, as BlockIter already points to the next element. Original commit message: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313751
* Revert r312719 "[MachineCombiner] Update instruction depths incrementally ↵Hans Wennborg2017-09-131-82/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for large BBs." This caused PR34596. > [MachineCombiner] Update instruction depths incrementally for large BBs. > > Summary: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313213
* [MachineCombiner] Update instruction depths incrementally for large BBs.Florian Hahn2017-09-071-23/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For large basic blocks with lots of combinable instructions, the MachineTraceMetrics computations in MachineCombiner can dominate the compile time, as computing the trace information is quadratic in the number of instructions in a BB and it's relevant successors/predecessors. In most cases, knowing the instruction depth should be enough to make combination decisions. As we already iterate over all instructions in a basic block, the instruction depth can be computed incrementally. This reduces the cost of machine-combine drastically in cases where lots of instructions are combined. The major drawback is that AFAIK, computing the critical path length cannot be done incrementally. Therefore we only compute instruction depths incrementally, for basic blocks with more instructions than inc_threshold. The -machine-combiner-inc-threshold option can be used to set the threshold and allows for easier experimenting and checking if using incremental updates for all basic blocks has any impact on the performance. Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn Reviewed By: fhahn Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 312719
* [NFC] Move DEBUG_TYPE macro below includes...Jakub Kuderski2017-07-131-2/+2
| | | | | | in MachineCombiner.cpp. llvm-svn: 307940
* CodeGen: Rename DEBUG_TYPE to match passnamesMatthias Braun2017-05-251-2/+2
| | | | | | | | Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible. llvm-svn: 303921
* Fix up grammar in a comment.Eric Christopher2017-03-151-1/+1
| | | | llvm-svn: 297898
* Compile time decreasing in the case we're dealing with Machine Combiner. Andrew V. Tischenko2017-02-131-15/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch compile time was about 21s (see below). After this patch we have less than 2s (see bellow). Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz DAGCombiner - trunk time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m1.685s DAGCombiner + Speed patch time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m1.655s MachineCombiner w/o Speed patch time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m21.614s MachineCombiner + Speed patch time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m1.593s The test spill_fdiv.ll is attached to D29627 D29627 should be closed. llvm-svn: 294936
* MachineInstr: Remove parameter from dump()Matthias Braun2017-01-291-1/+3
| | | | | | | | | | | | | The primary use of the dump() functions in LLVM is for use in a debugger. Unfortunately lldb does not seem to handle default arguments so using `p SomeMI.dump()` fails and you have to type the longer `p SomeMI.dump(nullptr)`. Remove the paramter to make the most common use easy. (You can always construct something like `p SomeMI.print(dbgs(),MyTII)` if you need more features). Differential Revision: https://reviews.llvm.org/D29241 llvm-svn: 293440
* machine combiner: fix pretty printerSebastian Pop2016-12-211-1/+1
| | | | | | | | | | | we used to print UNKNOWN instructions when the instruction to be printer was not yet inserted in any BB: in that case the pretty printer would not be able to compute a TII as the instruction does not belong to any BB or function yet. This patch explicitly passes the TII to the pretty-printer. Differential Revision: https://reviews.llvm.org/D27645 llvm-svn: 290228
* instr-combiner: sum up all latencies of the transformed instructionsSebastian Pop2016-12-111-2/+9
| | | | | | | | | | | | | | | | | | | | We have found that -- when the selected subarchitecture has a scheduling model and we are not optimizing for size -- the machine-instruction combiner uses a too-simple algorithm to compute the cost of one of the two alternatives [before and after running a combining pass on a section of code], and therefor it throws away the combination results too often. This fix has the potential to help any ISA with the potential to combine instructions and for which at least one subarchitecture has a scheduling model. As of now, this is only known to definitely affect AArch64 subarchitectures with a scheduling model. Regression tested on AMD64/GNU-Linux, new test case tested to fail on an unpatched compiler and pass on a patched compiler. Patch by Abe Skolnik and Sebastian Pop. llvm-svn: 289399
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-1/+1
| | | | llvm-svn: 283004
* [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)Gerolf Hoflehner2016-04-241-1/+11
| | | | | | | | | | | | | | | | | | | The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328
* Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64Daniel Sanders2016-04-221-11/+1
| | | | | | It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127
* [MachineCombiner] Support for floating-point FMA on ARM64Gerolf Hoflehner2016-04-221-1/+11
| | | | | | | | | | | | | | | | Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098
* [NFC] Header cleanupMehdi Amini2016-04-181-2/+1
| | | | | | | | | | | | | | Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595
* Minor code cleanup. NFC.Junmo Park2016-02-271-1/+1
| | | | llvm-svn: 262096
* Reapply "CodeGen: Use references in MachineTraceMetrics::Trace, NFC"Duncan P. N. Exon Smith2016-02-221-4/+4
| | | | | | | | | | | | | This reverts commit r261510, effectively reapplying r261509. The original commit missed a caller in AArch64ConditionalCompares. Original commit message: Pass non-null arguments by reference in MachineTraceMetrics::Trace, simplifying future work to remove implicit iterator => pointer conversions. llvm-svn: 261511
* Revert "CodeGen: Use references in MachineTraceMetrics::Trace, NFC"Duncan P. N. Exon Smith2016-02-221-4/+4
| | | | | | | This reverts commit r261509. I'm not sure how this compiled locally, but something was out of whack. llvm-svn: 261510
* CodeGen: Use references in MachineTraceMetrics::Trace, NFCDuncan P. N. Exon Smith2016-02-221-4/+4
| | | | | | | | Pass non-null arguments by reference in MachineTraceMetrics::Trace, simplifying future work to remove implicit iterator => pointer conversions. llvm-svn: 261509
* less indent; NFCISanjay Patel2015-11-101-46/+47
| | | | llvm-svn: 252643
* add 'MustReduceDepth' as an objective/cost-metric for the MachineCombinerSanjay Patel2015-11-101-29/+53
| | | | | | | | | | | | | | | | | | | | | | This is one of the problems noted in PR25016: https://llvm.org/bugs/show_bug.cgi?id=25016 and: http://lists.llvm.org/pipermail/llvm-dev/2015-October/090998.html The spilling problem is independent and not addressed by this patch. The MachineCombiner was doing reassociations that don't improve or even worsen the critical path. This is caused by inclusion of the "slack" factor when calculating the critical path of the original code sequence. If we don't add that, then we have a more conservative cost comparison of the old code sequence vs. a new sequence. The more liberal calculation must be preserved, however, for the AArch64 MULADD patterns because benchmark regressions were observed without that. The two failing test cases now have identical asm that does what we want: a + b + c + d ---> (a + b) + (c + d) Differential Revision: http://reviews.llvm.org/D13417 llvm-svn: 252616
* replace MachineCombinerPattern namespace and enum with enum class; NFCISanjay Patel2015-11-051-1/+1
| | | | | | | | Also, remove an enum hack where enum values were used as indexes into an array. We may want to make this a real class to allow pattern-based queries/customization (D13417). llvm-svn: 252196
* Fix Clang-tidy modernize-use-nullptr warnings in source directories and ↵Hans Wennborg2015-10-061-4/+3
| | | | | | | | | | generated files; other minor cleanups. Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D13321 llvm-svn: 249482
* include equal sign in debug equations; NFCSanjay Patel2015-10-031-2/+2
| | | | llvm-svn: 249248
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-111-3/+1
| | | | llvm-svn: 244604
* [MachineCombiner] Don't use the opcode-only form of computeInstrLatencyHal Finkel2015-08-051-1/+1
| | | | | | | | | In r242277, I updated the MachineCombiner to work with itineraries, but I missed a call that is scheduling-model-only (the opcode-only form of computeInstrLatency). Using the form that takes an MI* allows this to work with itineraries (and should be NFC for subtargets with scheduling models). llvm-svn: 244020
* wrap OptSize and MinSize attributes for easier and consistent access (NFCI)Sanjay Patel2015-08-041-0/+1
| | | | | | | | | | | | | | | | | Create wrapper methods in the Function class for the OptimizeForSize and MinSize attributes. We want to hide the logic of "or'ing" them together when optimizing just for size (-Os). Currently, we are not consistent about this and rely on a front-end to always set OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here that should be added as follow-on patches with regression tests. This patch is NFC-intended: it just replaces existing direct accesses of the attributes by the equivalent wrapper call. Differential Revision: http://reviews.llvm.org/D11734 llvm-svn: 243994
* [MachineCombiner] Work with itinerariesHal Finkel2015-07-151-4/+9
| | | | | | | | | | | | MachineCombiner predicated its use of scheduling-based metrics on hasInstrSchedModel(), but useful conclusions can be drawn from pipeline itineraries as well. Almost all of the logic (except for resource tracking in preservesResourceLen) can be used if we have an itinerary, so enable it in that case as well. This will be used by the PowerPC backend in an upcoming commit. llvm-svn: 242277
* Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)Alexander Kornienko2015-06-231-1/+1
| | | | | | Apparently, the style needs to be agreed upon first. llvm-svn: 240390
* [x86] generalize reassociation optimization in machine combiner to 2 ↵Sanjay Patel2015-06-231-18/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instructions Currently ( D10321, http://reviews.llvm.org/rL239486 ), we can use the machine combiner pass to reassociate the following sequence to reduce the critical path: A = ? op ? B = A op X C = B op Y --> A = ? op ? B = X op Y C = A op B 'op' is currently limited to x86 AVX scalar FP adds (with fast-math on), but in theory, it could be any associative math/logic op (see TODO in code comment). This patch generalizes the pattern match to ignore the instruction that defines 'A'. So instead of a sequence of 3 adds, we now only need to find 2 dependent adds and decide if it's worth reassociating them. This generalization has a compile-time cost because we can now match more instruction sequences and we rely more heavily on the machine combiner to discard sequences where reassociation doesn't improve the critical path. For example, in the new test case: A = M div N B = A add X C = B add Y We'll match 2 reassociation patterns, but this transform doesn't reduce the critical path: A = M div N B = A add Y C = B add X We need the combiner to reject that pattern but select this: A = M div N B = X add Y C = B add A Differential Revision: http://reviews.llvm.org/D10460 llvm-svn: 240361
OpenPOWER on IntegriCloud