summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/IPO
Commit message (Collapse)AuthorAgeFilesLines
* Change some dyn_cast to more apropriate isa. NFCFangrui Song2019-04-051-1/+1
| | | | llvm-svn: 357773
* [IR] Refactor attribute methods in Function class (NFC)Evandro Menezes2019-04-044-6/+6
| | | | | | | | Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731
* [IR] Create new method in `Function` class (NFC)Evandro Menezes2019-04-034-8/+6
| | | | | | | | | Create method `optForNone()` testing for the function level equivalent of `-O0` and refactor appropriately. Differential revision: https://reviews.llvm.org/D59852 llvm-svn: 357638
* [ArgPromotion] Set debug location at updated callsitesVedant Kumar2019-04-021-7/+9
| | | | | | | | | | | | Set the correct debug location on instructions which load arguments in preparation for a call to an arg-promoted function. This prevents location cascade from misattributing the line/scope of one of these loads to the location of the instruction preceding the call. Differential Revision: https://reviews.llvm.org/D60113 llvm-svn: 357500
* [SampleProfile] Repeat indirect call promotion only when the target is ↵Taewook Oh2019-04-021-0/+3
| | | | | | | | | | | | | | | | actually hot. Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets. Reviewers: danielcdh, wmi Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59940 llvm-svn: 357484
* [PruneEH] Don't split musttail call from retJoseph Tremoulet2019-04-021-1/+2
| | | | | | | | | | | | | | | | | | | Summary: When inserting an `unreachable` after a noreturn call, we must ensure that it's not a musttail call to avoid breaking the IR invariants for musttail calls. Reviewers: fedor.sergeev, majnemer Reviewed By: majnemer Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60079 llvm-svn: 357483
* [Internalize] Replace uses of std::set with DenseSetFangrui Song2019-04-021-4/+3
| | | | | | This makes it faster and saves 104 bytes for my build. llvm-svn: 357458
* [Internalize] Replace fstream with line_iterator for ↵Fangrui Song2019-04-021-9/+7
| | | | | | | | -internalize-public-api-file This makes my libLLVMipo.so.9svn smaller by 360 bytes. llvm-svn: 357457
* Resubmit r356511 "[TailCallElim] Add tailcall elimination pass to LTO pipelines"Robert Lougher2019-03-201-0/+4
| | | | | | Failing LLD tests have been fixed in r356593. llvm-svn: 356594
* Revert r356511 "[TailCallElim] Add tailcall elimination pass to LTO pipelines"Robert Lougher2019-03-191-4/+0
| | | | | | Due to buildbot failures (LLD tests). llvm-svn: 356516
* [TailCallElim] Add tailcall elimination pass to LTO pipelinesRobert Lougher2019-03-191-0/+4
| | | | | | | | | | LTO provides additional opportunities for tailcall elimination due to link-time inlining and visibility of nocapture attribute. Testing showed negligible impact on compilation times. Differential Revision: https://reviews.llvm.org/D58391 llvm-svn: 356511
* [ThinLTO] Restructure AliasSummary to contain ValueInfo of AliaseeTeresa Johnson2019-03-151-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The AliasSummary previously contained the AliaseeGUID, which was only populated when reading the summary from bitcode. This patch changes it to instead hold the ValueInfo of the aliasee, and always populates it. This enables more efficient access to the ValueInfo (specifically in the recent patch r352438 which needed to perform an index hash table lookup using the aliasee GUID). As noted in the comments in AliasSummary, we no longer technically need to keep a pointer to the corresponding aliasee summary, since it could be obtained by walking the list of summaries on the ValueInfo looking for the summary in the same module. However, I am concerned that this would be inefficient when walking through the index during the thin link for various analyses. That can be reevaluated in the future. By always populating this new field, we can remove the guard and special handling for a 0 aliasee GUID when dumping the dot graph of the summary. An additional improvement in this patch is when reading the summaries from LLVM assembly we now set the AliaseeSummary field to the aliasee summary in that same module, which makes it consistent with the behavior when reading the summary from bitcode. Reviewers: pcc, mehdi_amini Subscribers: inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D57470 llvm-svn: 356268
* [PGO] Context sensitive PGO (part 3)Rong Xu2019-03-041-9/+34
| | | | | | | | Part 3 of CSPGO changes (mostly related to PassMananger). Differential Revision: https://reviews.llvm.org/D54175 llvm-svn: 355330
* Add a module pass for order file instrumentationManman Ren2019-02-281-0/+7
| | | | | | | | | | | | | | | The basic idea of the pass is to use a circular buffer to log the execution ordering of the functions. We only log the function when it is first executed. We use a 8-byte hash to log the function symbol name. In this pass, we add three global variables: (1) an order file buffer: a circular buffer at its own llvm section. (2) a bitmap for each module: one byte for each function to say if the function is already executed. (3) a global index to the order file buffer. At the function prologue, if the function has not been executed (by checking the bitmap), log the function hash, then atomically increase the index. Differential Revision: https://reviews.llvm.org/D57463 llvm-svn: 355133
* [PGO] Context sensitive PGO (part 2)Rong Xu2019-02-282-3/+4
| | | | | | | | | | | Part 2 of CSPGO changes (mostly related to ProfileSummary). Note that I use a default parameter in setProfileSummary() and getSummary(). This is to break the dependency in clang. I will make the parameter explicit after changing clang in a separated patch. Differential Revision: https://reviews.llvm.org/D54175 llvm-svn: 355131
* Temporarily revert "ArgumentPromotion should copy all metadata to new ↵Eric Christopher2019-02-281-3/+4
| | | | | | | | Function" and the dependent patch "Refine ArgPromotion metadata handling" as they're causing segfaults in argument promotion. This reverts commits r354032 and r353537. llvm-svn: 355060
* [HotColdSplit] Disable splitting for sanitized functionsVedant Kumar2019-02-261-2/+9
| | | | | | | | | | Splitting can make sanitizer errors harder to understand, as the trapping instruction may not be in the function where the bug was detected. rdar://48142697 llvm-svn: 354931
* [Inliner] Pass nullptr for the ORE param of getInlineCost if RemarkEnabledWei Mi2019-02-212-3/+10
| | | | | | | | | | | | | | | | | | | is false. Right now for inliner and partial inliner, we always pass the address of a valid ORE object to getInlineCost even if RemarkEnabled is false because of no -Rpass is specified. Since ComputeFullInlineCost will be set to true if ORE is non-null in getInlineCost, this introduces the problem that in getInlineCost we cannot return early even if we already know the cost is definitely higher than the threshold. It is a general problem for compile time. This patch fixes that by pass nullptr as the ORE argument if RemarkEnabled is false. Differential Revision: https://reviews.llvm.org/D58399 llvm-svn: 354542
* [HotColdSplit] Schedule splitting late to fix perf regressionVedant Kumar2019-02-151-5/+10
| | | | | | | | | | | | | | | With or without PGO data applied, splitting early in the pipeline (either before the inliner or shortly after it) regresses performance across SPEC variants. The cause appears to be that splitting hides context for subsequent optimizations. Schedule splitting late again, in effect reversing r352080, which scheduled the splitting pass early for code size benefits (documented in https://reviews.llvm.org/D57082). Differential Revision: https://reviews.llvm.org/D58258 llvm-svn: 354158
* [ThinLTO] Detect partially split modules during the thin linkTeresa Johnson2019-02-142-18/+16
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The changes to disable LTO unit splitting by default (r350949) and detect inconsistently split LTO units (r350948) are causing some crashes when the inconsistency is detected in multiple threads simultaneously. Fix that by having the code always look for the inconsistently split LTO units during the thin link, by checking for the presence of type tests recorded in the summaries. Modify test added in r350948 to remove single threading required to fix a bot failure due to this issue (and some debugging options added in the process of diagnosing it). Reviewers: pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57561 llvm-svn: 354062
* Refine ArgPromotion metadata handlingTeresa Johnson2019-02-141-0/+2
| | | | | | | | | | | | | | | | | | | | Summary: In r353537 we now copy all metadata to the new function, with the old being removed when the old function is eliminated. In some cases the old function is dropped to a declaration (seems to only occur with the old PM). Go ahead and clear all metadata from the old function to handle that case, since verification will complain otherwise. This is consistent with what was being done for debug metadata before r353537. Reviewers: davidxl, uabelho Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58215 llvm-svn: 354032
* [GlobalOpt] Simplify __cxa_atexit eliminationFangrui Song2019-02-091-39/+9
| | | | | | | | | | | | | cxxDtorIsEmpty checks callers recursively to determine if the __cxa_atexit-registered function is empty, and eliminates the __cxa_atexit call accordingly. This recursive check is unnecessary as redundant instructions and function calls can be removed by early-cse and inliner. In addition, cxxDtorIsEmpty does not mark visited function and it may visit a function exponential times (multiplication principle). llvm-svn: 353603
* [NFC] Avoid passing blocks vector to the OutlineRegionInfo constructor by value.Sergey Dmitriev2019-02-081-4/+3
| | | | | | | | | | | | | | Reviewers: vsk, fhahn, davidxl Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57957 llvm-svn: 353582
* ArgumentPromotion should copy all metadata to new FunctionTeresa Johnson2019-02-081-4/+1
| | | | | | | | | | | | | | | | | | | Summary: ArgumentPromotion had code to specifically move the dbg metadata over to the new function, but other metadata such as the function_entry_count !prof metadata was not. Replace code that moved dbg metadata with a call to copyMetadata. The old metadata is automatically removed when the old Function is removed. Reviewers: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57846 llvm-svn: 353537
* [CodeExtractor] Update function's assumption cache after extracting blocks ↵Sergey Dmitriev2019-02-083-26/+60
| | | | | | | | | | | | | | | | | | from it Summary: Assumption cache's self-updating mechanism does not correctly handle the case when blocks are extracted from the function by the CodeExtractor. As a result function's assumption cache may have stale references to the llvm.assume calls that were moved to the outlined function. This patch fixes this problem by removing extracted llvm.assume calls from the function’s assumption cache. Reviewers: hfinkel, vsk, fhahn, davidxl, sanjoy Reviewed By: hfinkel, vsk Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57215 llvm-svn: 353500
* [HotColdSplit] With PGO add profile entry metadata to split cold functionTeresa Johnson2019-02-071-2/+11
| | | | | | | | | | | | | | | | | | Summary: When compiling with profile data, ensure the split cold function gets cold function_entry_count metadata (just use 0 since it should be cold). Otherwise with function sections it will not be placed in the unlikely text section with other cold code. Reviewers: vsk Subscribers: sebpop, hiraditya, davidxl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57900 llvm-svn: 353434
* [HotColdSplit] Move splitting after instrumented PGO useTeresa Johnson2019-02-061-5/+5
| | | | | | | | | | | | | | | | | | | | | Summary: Follow up to D57082 which moved splitting earlier in the pipeline, in order to perform it before inlining. However, it was moved too early, before the IR is annotated with instrumented PGO data. This caused the splitting to incorrectly determine cold functions. Move it to just after PGO annotation (still before inlining), in both pass managers. Reviewers: vsk, hiraditya, sebpop Subscribers: mehdi_amini, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57805 llvm-svn: 353270
* [HotColdSplit] Do not split out `resume` instructionsVedant Kumar2019-02-051-2/+6
| | | | | | | | | Resumes that are not reachable from a cleanup landing pad are considered to be unreachable. It’s not safe to split them out. rdar://47808235 llvm-svn: 353242
* Fix narrowing issue from r353129Richard Trieu2019-02-051-1/+2
| | | | llvm-svn: 353134
* [SamplePGO][NFC] Minor improvement to replace a temporary vector with aWei Mi2019-02-051-3/+2
| | | | | | | | brace-enclosed init list. Differential Revision: https://reviews.llvm.org/D57726 llvm-svn: 353129
* [opaque pointer types] Pass value type to GetElementPtr creation.James Y Knight2019-02-011-1/+2
| | | | | | | | | This cleans up all GetElementPtr creation in LLVM to explicitly pass a value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57173 llvm-svn: 352913
* [opaque pointer types] Pass value type to LoadInst creation.James Y Knight2019-02-014-15/+23
| | | | | | | | | This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57172 llvm-svn: 352911
* [opaque pointer types] Pass function types to InvokeInst creation.James Y Knight2019-02-011-1/+1
| | | | | | | | | This cleans up all InvokeInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57171 llvm-svn: 352910
* [opaque pointer types] Pass function types to CallInst creation.James Y Knight2019-02-012-6/+7
| | | | | | | | | This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909
* Provide reason messages for unviable inliningYevgeny Rouban2019-02-011-6/+15
| | | | | | | | | | | | | InlineCost's isInlineViable() is changed to return InlineResult instead of bool. This provides messages for failure reasons and allows to get more specific messages for cases where callsites are not viable for inlining. Reviewed By: xbolva00, anemet Differential Revision: https://reviews.llvm.org/D57089 llvm-svn: 352849
* [opaque pointer types] Add a FunctionCallee wrapper type, and use it.James Y Knight2019-02-012-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc doesn't choke on it, hopefully. Original Message: The FunctionCallee type is effectively a {FunctionType*,Value*} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352827
* Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it."James Y Knight2019-01-312-15/+9
| | | | | | | | | This reverts commit f47d6b38c7a61d50db4566b02719de05492dcef1 (r352791). Seems to run into compilation failures with GCC (but not clang, where I tested it). Reverting while I investigate. llvm-svn: 352800
* [opaque pointer types] Add a FunctionCallee wrapper type, and use it.James Y Knight2019-01-312-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The FunctionCallee type is effectively a {FunctionType*,Value*} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352791
* [IPCP] Don't crash due to arg count/type mismatch between caller/calleeBjorn Pettersson2019-01-291-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch avoids an assert in IPConstantPropagation when there is a argument count/type mismatch between the caller and the callee. While this is actually UB on C-level (clang emits a warning), the IR verifier seems to accept it. I'm not sure what other frontends/languages might think about this, so simply bailing out to avoid hitting an assert (in CallSiteBase<>::getArgOperand or Value::doRAUW) seems like a simple solution. The problem is exposed by the fact that AbstractCallSites will look through a bitcast at the callee position of a call/invoke. Reviewers: jdoerfert, reames, efriedma Reviewed By: jdoerfert, efriedma Subscribers: eli.friedman, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D57052 llvm-svn: 352469
* [ThinLTO] Refine reachability check to fix compile time increaseTeresa Johnson2019-01-281-8/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: A recent fix to the ThinLTO whole program dead code elimination (D56117) increased the thin link time on a large MSAN'ed binary by 2x. It's likely that the time increased elsewhere, but was more noticeable here since it was already large and ended up timing out. That change made it so we would repeatedly scan all copies of linkonce symbols for liveness every time they were encountered during the graph traversal. This was needed since we only mark one copy of an aliasee as live when we encounter a live alias. This patch fixes the issue in a more efficient manner by simply proactively visiting the aliasee (thus marking all copies live) when we encounter a live alias. Two notes: One, this requires a hash table lookup (finding the aliasee summary in the index based on aliasee GUID). However, the impact of this seems to be small compared to the original pre-D56117 thin link time. It could be addressed if we keep the aliasee ValueInfo in the alias summary instead of the aliasee GUID, which I am exploring in a separate patch. Second, we only populate the aliasee GUID field when reading summaries from bitcode (whether we are reading individual summaries and merging on the fly to form the compiled index, or reading in a serialized combined index). Thankfully, that's currently the only way we can get to this code as we don't yet support reading summaries from LLVM assembly directly into a tool that performs the thin link (they must be converted to bitcode first). I added a FIXME, however I have the fix under test already. The easiest fix is to simply populate this field always, which isn't hard, but more likely the change I am exploring to store the ValueInfo instead as described above will subsume this. I don't want to hold up the regression fix for this though. Reviewers: trentxintong Subscribers: mehdi_amini, inglorion, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D57203 llvm-svn: 352438
* [HotColdSplit] Introduce a cost model to control splitting behaviorVedant Kumar2019-01-251-36/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main goal of the model is to avoid *increasing* function size, as that would eradicate any memory locality benefits from splitting. This happens when: - There are too many inputs or outputs to the cold region. Argument materialization and reloads of outputs have a cost. - The cold region has too many distinct exit blocks, causing a large switch to be formed in the caller. - The code size cost of the split code is less than the cost of a set-up call. A secondary goal is to prevent excessive overall binary size growth. With the cost model in place, I experimented to find a splitting threshold that works well in practice. To make warm & cold code easily separable for analysis purposes, I moved split functions to a "cold" section. I experimented with thresholds between [0, 4] and set the default to the threshold which minimized geomean __text size. Experiment data from building LNT+externals for X86 (N = 639 programs, all sizes in bytes): | Configuration | __text geom size | __cold geom size | TEXT geom size | | **-Os** | 1736.3 | 0, n=0 | 10961.6 | | -Os, thresh=0 | 1740.53 | 124.482, n=134 | 11014 | | -Os, thresh=1 | 1734.79 | 57.8781, n=90 | 10978.6 | | -Os, thresh=2 | ** 1733.85 ** | 65.6604, n=61 | 10977.6 | | -Os, thresh=3 | 1733.85 | 65.3071, n=61 | 10977.6 | | -Os, thresh=4 | 1735.08 | 67.5156, n=54 | 10965.7 | | **-Oz** | 1554.4 | 0, n=0 | 10153 | | -Oz, thresh=2 | ** 1552.2 ** | 65.633, n=61 | 10176 | | **-O3** | 2563.37 | 0, n=0 | 13105.4 | | -O3, thresh=2 | ** 2559.49 ** | 71.1072, n=61 | 13162.4 | Picking thresh=2 reduces the geomean __text section size by 0.14% at -Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that TEXT size is page-aligned, whereas section sizes are byte-aligned. Experiment data from building LNT+externals for ARM64 (N = 558 programs, all sizes in bytes): | Configuration | __text geom size | __cold geom size | TEXT geom size | | **-Os** | 1763.96 | 0, n=0 | 42934.9 | | -Os, thresh=2 | ** 1760.9 ** | 76.6755, n=61 | 42934.9 | Picking thresh=2 reduces the geomean __text section size by 0.17% at -Os and causes no growth in the TEXT segment. Measurements were done with D57082 (r352080) applied. Differential Revision: https://reviews.llvm.org/D57125 llvm-svn: 352228
* [HotColdSplit] Describe the pass in more detail, NFCVedant Kumar2019-01-251-5/+18
| | | | llvm-svn: 352161
* [HotColdSplit] Split more aggressively before/after cold invokesVedant Kumar2019-01-251-39/+55
| | | | | | | | | | | While a cold invoke itself and its unwind destination can't be extracted, code which unconditionally executes before/after the invoke may still be profitable to extract. With cost model changes from D57125 applied, this gives a 3.5% increase in split text across LNT+externals on arm64 at -Os. llvm-svn: 352160
* [HotColdSplit] Move splitting earlier in the pipelineVedant Kumar2019-01-241-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performing splitting early has several advantages: - Inhibiting inlining of cold code early improves code size. Compared to scheduling splitting at the end of the pipeline, this cuts code size growth in half within the iOS shared cache (0.69% to 0.34%). - Inhibiting inlining of cold code improves compile time. There's no need to inline split cold functions, or to inline as much *within* those split functions as they are marked `minsize`. - During LTO, extra work is only done in the pre-link step. Less code must be inlined during cross-module inlining. An additional motivation here is that the most common cold regions identified by the static/conservative splitting heuristic can (a) be found before inlining and (b) do not grow after inlining. E.g. __assert_fail, os_log_error. The disadvantages are: - Some opportunities for splitting out cold code may be missed. This gap can potentially be narrowed by adding a worklist algorithm to the splitting pass. - Some opportunities to reduce code size may be lost (e.g. store sinking, when one side of the CFG diamond is split). This does not outweigh the code size benefits of splitting earlier. On net, splitting early in the pipeline has substantial code size benefits, and no major effects on memory locality or performance. We measured memory locality using ktrace data, and consistently found that 10% fewer pages were needed to capture 95% of text page faults in key iOS benchmarks. We measured performance on frequency-stabilized iOS devices using LNT+externals. This reverses course on the decision made to schedule splitting late in r344869 (D53437). Differential Revision: https://reviews.llvm.org/D57082 llvm-svn: 352080
* Revert "[Sanitizers] UBSan unreachable incompatible with ASan in the ↵Julian Lettner2019-01-241-1/+0
| | | | | | | | presence of `noreturn` calls" This reverts commit cea84ab93aeb079a358ab1c8aeba6d9140ef8b47. llvm-svn: 352069
* Revert "[HotColdSplitting] Get DT and PDT from the pass manager."Florian Hahn2019-01-241-32/+9
| | | | | | | | | This reverts commit a6982414edf315c39ae93f3c3322476217119e99 (llvm-svn: 352036), because it causes a memory leak in the pass manager. Failing bot http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio llvm-svn: 352041
* [HotColdSplitting] Get DT and PDT from the pass manager.Florian Hahn2019-01-241-9/+32
| | | | | | | | | | | | | | | | | | | Instead of manually computing DT and PDT, we can get the from the pass manager, which ideally has them already cached. With the new pass manager, we could even preserve DT/PDT on a per function basis in a module pass. I think this also addresses the TODO about re-using the computed DTs for BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we will fetch the cached version later. Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D57092 llvm-svn: 352036
* [Sanitizers] UBSan unreachable incompatible with ASan in the presence of ↵Julian Lettner2019-01-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `noreturn` calls Summary: UBSan wants to detect when unreachable code is actually reached, so it adds instrumentation before every `unreachable` instruction. However, the optimizer will remove code after calls to functions marked with `noreturn`. To avoid this UBSan removes `noreturn` from both the call instruction as well as from the function itself. Unfortunately, ASan relies on this annotation to unpoison the stack by inserting calls to `_asan_handle_no_return` before `noreturn` functions. This is important for functions that do not return but access the the stack memory, e.g., unwinder functions *like* `longjmp` (`longjmp` itself is actually "double-proofed" via its interceptor). The result is that when ASan and UBSan are combined, the `noreturn` attributes are missing and ASan cannot unpoison the stack, so it has false positives when stack unwinding is used. Changes: # UBSan now adds the `expect_noreturn` attribute whenever it removes the `noreturn` attribute from a function # ASan additionally checks for the presence of this attribute Generated code: ``` call void @__asan_handle_no_return // Additionally inserted to avoid false positives call void @longjmp call void @__asan_handle_no_return call void @__ubsan_handle_builtin_unreachable unreachable ``` The second call to `__asan_handle_no_return` is redundant. This will be cleaned up in a follow-up patch. rdar://problem/40723397 Reviewers: delcypher, eugenis Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D56624 llvm-svn: 352003
* Update entry count for cold callsDavid Callahan2019-01-241-2/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Profile sample files include the number of times each entry or inlined call site is sampled. This is translated into the entry count metadta on functions. When sample data is being read, if a call site that was inlined in the sample program is considered cold and not inlined, then the entry count of the out-of-line functions does not reflect the current compilation. In this patch, we note call sites where the function was not inlined and as a last action of the sample profile loading, we update the called function's entry count to reflect the calls from these call sites which are not included in the profile file. Reviewers: danielcdh, wmi, Kader, modocache Reviewed By: wmi Subscribers: davidxl, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D52845 llvm-svn: 352001
* [HotColdSplitting] Remove unused SSAUpdater.h include (NFC).Florian Hahn2019-01-231-1/+0
| | | | llvm-svn: 351945
OpenPOWER on IntegriCloud