summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.Chandler Carruth2017-04-271-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, this pass only focuses on *trivial* loop unswitching. At that reduced problem it remains significantly better than the current loop unswitch: - Old pass is worse than cubic complexity. New pass is (I think) linear. - New pass is much simpler in its design by focusing on full unswitching. (See below for details on this). - New pass doesn't carry state for thresholds between pass iterations. - New pass doesn't carry state for correctness (both miscompile and infloop) between pass iterations. - New pass produces substantially better code after unswitching. - New pass can handle more trivial unswitch cases. - New pass doesn't recompute the dominator tree for the entire function and instead incrementally updates it. I've ported all of the trivial unswitching test cases from the old pass to the new one to make sure that major functionality isn't lost in the process. For several of the test cases I've worked to improve the precision and rigor of the CHECKs, but for many I've just updated them to handle the new IR produced. My initial motivation was the fact that the old pass carried state in very unreliable ways between pass iterations, and these mechansims were incompatible with the new pass manager. However, I discovered many more improvements to make along the way. This pass makes two very significant assumptions that enable most of these improvements: 1) Focus on *full* unswitching -- that is, completely removing whatever control flow construct is being unswitched from the loop. In the case of trivial unswitching, this means removing the trivial (exiting) edge. In non-trivial unswitching, this means removing the branch or switch itself. This is in opposition to *partial* unswitching where some part of the unswitched control flow remains in the loop. Partial unswitching only really applies to switches and to folded branches. These are very similar to full unrolling and partial unrolling. The full form is an effective canonicalization, the partial form needs a complex cost model, cannot be iterated, isn't canonicalizing, and should be a separate pass that runs very late (much like unrolling). 2) Leverage LLVM's Loop machinery to the fullest. The original unswitch dates from a time when a great deal of LLVM's loop infrastructure was missing, ineffective, and/or unreliable. As a consequence, a lot of complexity was added which we no longer need. With these two overarching principles, I think we can build a fast and effective unswitcher that fits in well in the new PM and in the canonicalization pipeline. Some of the remaining functionality around partial unswitching may not be relevant today (not many test cases or benchmarks I can find) but if they are I'd like to add support for them as a separate layer that runs very late in the pipeline. Purely to make reviewing and introducing this code more manageable, I've split this into first a trivial-unswitch-only pass and in the next patch I'll add support for full non-trivial unswitching against a *fixed* threshold, exactly like full unrolling. I even plan to re-use the unrolling thresholds, as these are incredibly similar cost tradeoffs: we're cloning a loop body in order to end up with simplified control flow. We should only do that when the total growth is reasonably small. One of the biggest changes with this pass compared to the previous one is that previously, each individual trivial exiting edge from a switch was unswitched separately as a branch. Now, we unswitch the entire switch at once, with cases going to the various destinations. This lets us unswitch multiple exiting edges in a single operation and also avoids numerous extremely bad behaviors, where we would introduce 1000s of branches to test for thousands of possible values, all of which would take the exact same exit path bypassing the loop. Now we will use a switch with 1000s of cases that can be efficiently lowered into a jumptable. This avoids relying on somehow forming a switch out of the branches or getting horrible code if that fails for any reason. Another significant change is that this pass actively updates the CFG based on unswitching. For trivial unswitching, this is actually very easy because of the definition of loop simplified form. Doing this makes the code coming out of loop unswitch dramatically more friendly. We still should run loop-simplifycfg (at the least) after this to clean up, but it will have to do a lot less work. Finally, this pass makes much fewer attempts to simplify instructions based on the unswitch. Something like loop-instsimplify, instcombine, or GVN can be used to do increasingly powerful simplifications based on the now dominating predicate. The old simplifications are things that something like loop-instsimplify should get today or a very, very basic loop-instcombine could get. Keeping that logic separate is a big simplifying technique. Most of the code in this pass that isn't in the old one has to do with achieving specific goals: - Updating the dominator tree as we go - Unswitching all cases in a switch in a single step. I think it is still shorter than just the trivial unswitching code in the old pass despite having this functionality. Differential Revision: https://reviews.llvm.org/D32409 llvm-svn: 301576
* Disable GVN Hoist due to still more bugs being found in it. There isChandler Carruth2017-04-271-2/+2
| | | | | | | | | | also a discussion about exactly what we should do prior to re-enabling it. The current bug is http://llvm.org/PR32821 and the discussion about this is in the review thread for r300200. llvm-svn: 301505
* Simplify the CFG after loop pass cleanup.Filipe Cabecinhas2017-04-261-0/+5
| | | | | | | | | | | | | | | | Summary: Otherwise we might end up with some empty basic blocks or single-entry-single-exit basic blocks. This fixes PR32085 Reviewers: chandlerc, danielcdh Subscribers: mehdi_amini, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D30468 llvm-svn: 301395
* [PM] Run IndirectCallPromotion only when PGO is enabled.Davide Italiano2017-04-251-9/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D32465 llvm-svn: 301327
* [ThinLTO/Summary] Rename anonymous globals as last action ...Davide Italiano2017-04-231-3/+6
| | | | | | | | | | | | | | ... in the per-TU -O0 pipeline. The problem is that there could be passes registered using `addExtensionsToPM()` introducing unnamed globals. Asan is an example, but there may be others. Building cppcheck with `-flto=thin` and `-fsanitize=address` triggers an assertion while we're reading bitcode (in lib/LTO), as the BitcodeReader assumes there are no unnamed globals (because the namer has run). Unfortunately I wasn't able to find an easy way to test this. I added a comment in the hope nobody moves this again. llvm-svn: 301102
* Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."Geoff Berry2017-04-131-2/+2
| | | | | | This reverts commit r296872 now that PR32153 has been fixed. llvm-svn: 300200
* [GVNHoist] Re-enable GVNHoist by defaultGeoff Berry2017-04-111-3/+3
| | | | | | Turn GVNHoist back on by default now that PR32153 has been fixed. llvm-svn: 299944
* [PGO] Memory intrinsic calls optimization based on profiled sizeRong Xu2017-04-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | This patch optimizes two memory intrinsic operations: memset and memcpy based on the profiled size of the operation. The high level transformation is like: mem_op(..., size) ==> switch (size) { case s1: mem_op(..., s1); goto merge_bb; case s2: mem_op(..., s2); goto merge_bb; ... default: mem_op(..., size); goto merge_bb; } merge_bb: Differential Revision: http://reviews.llvm.org/D28966 llvm-svn: 299446
* Split the SimplifyCFG pass into two variants.Joerg Sonnenberger2017-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | The first variant contains all current transformations except transforming switches into lookup tables. The second variant contains all current transformations. The switch-to-lookup-table conversion results in code that is more difficult to analyze and optimize by other passes. Most importantly, it can inhibit Dead Code Elimination. As such it is often beneficial to only apply this transformation very late. A common example is inlining, which can often result in range restrictions for the switch expression. Changes in execution time according to LNT: SingleSource/Benchmarks/Misc/fp-convert +3.03% MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20% MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43% and a couple of smaller changes. For perimeter it also results 2.6% a smaller binary. Differential Revision: https://reviews.llvm.org/D30333 llvm-svn: 298799
* Disable loop unrolling and icp in SamplePGO ThinLTO compile phaseDehao Chen2017-03-231-1/+12
| | | | | | | | | | | | | | | | Summary: loop unrolling and icp will make the sample profile annotation much harder in the backend. So disable these 2 optimization in the ThinLTO compile phase. Will add a test in cfe in a separate patch. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: mehdi_amini, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D31217 llvm-svn: 298646
* IPO: Const correctness for summaries passed into passes.Peter Collingbourne2017-03-221-7/+5
| | | | | | | | | Pass const qualified summaries into importers and unqualified summaries into exporters. This lets us const-qualify the summary argument to thinBackend. Differential Revision: https://reviews.llvm.org/D31230 llvm-svn: 298534
* Only unswitch loops with uniform conditionsStanislav Mekhanoshin2017-03-171-2/+3
| | | | | | | | | | | | | | | | | | Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104
* Disable gvn-hoist (PR32153)Hans Wennborg2017-03-061-2/+2
| | | | llvm-svn: 297075
* Revert "Re-apply "[GVNHoist] Move GVNHoist to function simplification part ↵Benjamin Kramer2017-03-031-2/+2
| | | | | | | | of pipeline."" This reverts commit r296759. Miscompiles bash. llvm-svn: 296872
* Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."Geoff Berry2017-03-021-2/+2
| | | | | | | | | This re-applies r289696, which caused TSan perf regression, which has since been addressed in separate changes (see PR for details). See PR31382. llvm-svn: 296759
* Add call branch annotation for ICP promoted direct call in SamplePGO mode.Dehao Chen2017-02-231-3/+6
| | | | | | | | | | | | | | Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic. Reviewers: davidxl, eraman, xur Reviewed By: davidxl, xur Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D30282 llvm-svn: 296028
* Increases full-unroll threshold.Dehao Chen2017-02-181-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge): Performance: 403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75% Code size: 403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02% The compiler time overhead is similar with code size. Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc Reviewed By: hfinkel, chandlerc Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D28368 llvm-svn: 295538
* PMB: Add an importing WPD pass to the start of the ThinLTO backend pipeline.Peter Collingbourne2017-02-151-1/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D30008 llvm-svn: 295260
* IR: Type ID summary extensions for WPD; thread summary into WPD pass.Peter Collingbourne2017-02-131-1/+2
| | | | | | | | | | Make the whole thing testable by adding YAML I/O support for the WPD summary information and adding some negative tests that exercise the YAML support. Differential Revision: https://reviews.llvm.org/D29782 llvm-svn: 294981
* Rename LowerTypeTestsSummaryAction to PassSummaryAction. NFCI.Peter Collingbourne2017-02-091-5/+3
| | | | | | | | | I intend to use the same type with the same semantics in the WholeProgramDevirt pass. Differential Revision: https://reviews.llvm.org/D29746 llvm-svn: 294629
* [PM] MLSM has been enabled for a way. Reclaim a cl::opt.Davide Italiano2017-01-281-8/+2
| | | | llvm-svn: 293401
* Add loop pass insertion point EP_LateLoopOptimizationsKrzysztof Parzyszek2017-01-251-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D28694 llvm-svn: 293067
* IPO, LTO: Plumb the summary from the LTO API into the pass manager.Peter Collingbourne2017-01-201-2/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D28840 llvm-svn: 292661
* LowerTypeTests: Thread summary and action from the API and command line into ↵Peter Collingbourne2017-01-071-1/+2
| | | | | | | | | | | the pass. Also move command line handling out of the pass constructor and into a separate function. Differential Revision: https://reviews.llvm.org/D28422 llvm-svn: 291323
* [PMBuilder] Remove RunFloat2Int cl::opt.Davide Italiano2017-01-021-6/+1
| | | | | | The pass has been on by default for a long time without problems. llvm-svn: 290814
* [NewGVN] Add a flag to enable the pass via `-mllvm`.Davide Italiano2016-12-261-4/+14
| | | | | | | | NewGVN can be tested passing `-mllvm -enable-newgvn` to clang. Differential Revision: https://reviews.llvm.org/D28059 llvm-svn: 290548
* [LDist] Match behavior between invoking via optimization pipeline or opt ↵Adam Nemet2016-12-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | -loop-distribute In r267672, where the loop distribution pragma was introduced, I tried it hard to keep the old behavior for opt: when opt is invoked with -loop-distribute, it should distribute the loop (it's off by default when ran via the optimization pipeline). As MichaelZ has discovered this has the unintended consequence of breaking a very common developer work-flow to reproduce compilations using opt: First you print the pass pipeline of clang with -debug-pass=Arguments and then invoking opt with the returned arguments. clang -debug-pass will include -loop-distribute but the pass is invoked with default=off so nothing happens unless the loop carries the pragma. While through opt (default=on) we will try to distribute all loops. This changes opt's default to off as well to match clang. The tests are modified to explicitly enable the transformation. llvm-svn: 290235
* IPO: Remove the ModuleSummary argument to the FunctionImport pass. NFCI.Peter Collingbourne2016-12-211-7/+0
| | | | | | | | | No existing client is passing a non-null value here. This will come back in a slightly different form as part of the type identifier summary work. Differential Revision: https://reviews.llvm.org/D28006 llvm-svn: 290222
* Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline."Evgeniy Stepanov2016-12-171-2/+2
| | | | | | | | This reverts r289696, which caused TSan perf regression. See PR31382. llvm-svn: 290030
* Create SampleProfileLoader pass in llvm instead of clangDehao Chen2016-12-141-0/+5
| | | | | | | | | | | | Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder. Reviewers: tejohnson, davidxl, dnovillo Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D27743 llvm-svn: 289714
* [GVNHoist] Move GVNHoist to function simplification part of pipeline.Geoff Berry2016-12-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Move GVNHoist to later in the optimization pipeline, specifically, to the function simplification part of the pipeline. The new pipeline location allows GVNHoist to run on a function after its callees have been inlined but before the function has been considered for inlining into its callers, exposing more opportunities for hoisting. Performance results on AArch64 kryo: Improvements: Benchmarks/CoyoteBench/fftbench -24.952% spec2006/bzip2 -4.071% internal bmark -3.177% Benchmarks/PAQ8p/paq8p -1.754% spec2000/perlbmk -1.328% spec2006/h264ref -1.140% Regressions: internal bmark +1.818% Benchmarks/mafft/pairlocalalign +1.084% Reviewers: sebpop, dberlin, hiraditya Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27722 llvm-svn: 289696
* revert r289669 which breaks botsDehao Chen2016-12-141-5/+0
| | | | llvm-svn: 289676
* Create SampleProfileLoader pass in llvm instead of clangDehao Chen2016-12-141-0/+5
| | | | | | | | | | | | Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder. Reviewers: tejohnson, davidxl, dnovillo Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D27743 llvm-svn: 289669
* Introduce GlobalSplit pass.Peter Collingbourne2016-11-161-0/+5
| | | | | | | | | This pass splits globals into elements using inrange annotations on getelementptr indices. Differential Revision: https://reviews.llvm.org/D22295 llvm-svn: 287178
* Add comments about why we put LoopSink pass at the very late stage.Dehao Chen2016-11-101-0/+4
| | | | llvm-svn: 286480
* Enable Loop Sink pass for functions that has profile.Dehao Chen2016-11-091-4/+4
| | | | | | | | | | | | Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code. Reviewers: davidxl, hfinkel Subscribers: mehdi_amini, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26155 llvm-svn: 286325
* Conditionally eliminate library calls where the result value is not usedRong Xu2016-10-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This pass shrink-wraps a condition to some library calls where the call result is not used. For example: sqrt(val); is transformed to if (val < 0) sqrt(val); Even if the result of library call is not being used, the compiler cannot safely delete the call because the function can set errno on error conditions. Note in many functions, the error condition solely depends on the incoming parameter. In this optimization, we can generate the condition can lead to the errno to shrink-wrap the call. Since the chances of hitting the error condition is low, the runtime call is effectively eliminated. These partially dead calls are usually results of C++ abstraction penalty exposed by inlining. This optimization hits 108 times in 19 C/C++ programs in SPEC2006. Reviewers: hfinkel, mehdi_amini, davidxl Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz Differential Revision: https://reviews.llvm.org/D24414 llvm-svn: 284542
* Turn cl::values() (for enum) from a vararg function to using C++ variadic ↵Mehdi Amini2016-10-081-2/+1
| | | | | | | | | | | | | | | template The core of the change is supposed to be NFC, however it also fixes what I believe was an undefined behavior when calling: va_start(ValueArgs, Desc); with Desc being a StringRef. Differential Revision: https://reviews.llvm.org/D25342 llvm-svn: 283671
* [ThinLTO] Ensure anonymous globals renamed even at -O0Teresa Johnson2016-09-171-1/+9
| | | | | | | | | | | | | | | | | | | | | | Summary: This fixes an issue when files are compiled with -flto=thin at default -O0. We need to rename anonymous globals before attempting to write the module summary because all values need names for the summary. This was happening at -O1 and above, but not before the early exit when constructing the pipeline for -O0. Also add an internal -prepare-for-thinlto option to enable this to be tested via opt. Fixes PR30419. Reviewers: mehdi_amini Subscribers: probinson, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24701 llvm-svn: 281840
* Rename NameAnonFunctions to NameAnonGlobals to match what it is doing (NFC)Mehdi Amini2016-09-161-2/+2
| | | | llvm-svn: 281745
* [ThinLTO] Indirect call promotion fixes for promoted local functionsTeresa Johnson2016-08-291-3/+14
| | | | | | | | | | | | | | | | | | | Summary: Fix a couple issues limiting the application of indirect call promotion in ThinLTO mode: - Invoke indirect call promotion before globalopt, since it may eliminate imported functions which appear unreferenced. - Invoke indirect call promotion with InLTO=true so that the PGOFuncName metadata is used to get the name for locals which would have been renamed during promotion. Reviewers: davidxl, mehdi_amini Subscribers: Prazek, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24004 llvm-svn: 280024
* Add a new method to create SimpleInliner instance and make pre-inliner use this.Easwaran Raman2016-08-111-2/+12
| | | | | | | | This adds a createFunctionInliningPass pass that takes an InlineParams object and use this to create the pre-inliner pass. This prevents the regular inliner's threshold flag from influencing the preinliner. Differential revision: https://reviews.llvm.org/D23377 llvm-svn: 278377
* GVN-hoist: enable by defaultSebastian Pop2016-08-081-2/+2
| | | | llvm-svn: 278010
* Revert "(refs/bisect/bad) GVN-hoist: enable by default"Matthias Braun2016-08-061-2/+2
| | | | | | | | | | | GVN-Hoist appears to miscompile llvm-testsuite SingleSource/Benchmarks/Misc/fbench.c at the moment. I filed http://llvm.org/PR28880 This reverts commit r277786. llvm-svn: 277909
* GVN-hoist: enable by defaultSebastian Pop2016-08-041-2/+2
| | | | llvm-svn: 277786
* Revert "GVN-hoist: enable by default" & "Make GVN Hoisting obey optnone/bisect."Bruno Cardoso Lopes2016-08-041-2/+2
| | | | | | | | This reverts commits r277685 & r277688. r277685 broke compiler-rt compilation http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/23335 and r277685 is a followup from it. llvm-svn: 277690
* GVN-hoist: enable by defaultSebastian Pop2016-08-041-2/+2
| | | | | | | | | | As we addressed all compilation time problems with GVN-hoist https://llvm.org/bugs/show_bug.cgi?id=28670 this patch turns GVN-hoist back by default. Differential Revision: https://reviews.llvm.org/D23136 llvm-svn: 277685
* Add EP_CGSCCOptimizerLate extension point to PassManagerBuilderDavid Majnemer2016-07-281-0/+1
| | | | | | | | | | | | The EP_CGSCCOptimizerLate extension point allows adding CallGraphSCC passes at the end of the main CallGraphSCC passes and before any function simplification passes run by CGPassManager. Patch by Gor Nishanov! Differential Revision: https://reviews.llvm.org/D22897 llvm-svn: 276953
* [Profile] Use explicit flag to enable IR PGOXinliang David Li2016-07-231-8/+13
| | | | | | | | Patch by Jake VanAdrighem Differential Revision: http://reviews.llvm.org/D22607 llvm-svn: 276516
* Add flag to PassManagerBuilder to disable GVN Hoist Pass.Alina Sbirlea2016-07-221-1/+6
| | | | | | | | | | | | | | Summary: Adding a flag to diable GVN Hoisting by default. Note: The GVN Hoist Pass causes some Halide tests to hang. Halide will disable the pass while investigating. Reviewers: llvm-commits, chandlerc, spop, dberlin Subscribers: mehdi_amini Differential Revision: https://reviews.llvm.org/D22639 llvm-svn: 276479
OpenPOWER on IntegriCloud