summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/IPO
Commit message (Collapse)AuthorAgeFilesLines
* Set the prof weight correctly for call instructions in DeadArgumentElimination.Dehao Chen2017-03-231-0/+6
| | | | | | | | | | | | | | Summary: In DeadArgumentElimination, the call instructions will be replaced. We also need to set the prof weights so that function inlining can find the correct profile. Reviewers: eraman Reviewed By: eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31143 llvm-svn: 298660
* Disable loop unrolling and icp in SamplePGO ThinLTO compile phaseDehao Chen2017-03-231-1/+12
| | | | | | | | | | | | | | | | Summary: loop unrolling and icp will make the sample profile annotation much harder in the backend. So disable these 2 optimization in the ThinLTO compile phase. Will add a test in cfe in a separate patch. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: mehdi_amini, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D31217 llvm-svn: 298646
* [ThinLTO] Add support for emitting minimized bitcode for thin linkTeresa Johnson2017-03-231-17/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The cumulative size of the bitcode files for a very large application can be huge, particularly with -g. In a distributed build environment, all of these files must be sent to the remote build node that performs the thin link step, and this can exceed size limits. The thin link actually only needs the summary along with a bitcode symbol table. Until we have a proper bitcode symbol table, simply stripping the debug metadata results in significant size reduction. Add support for an option to additionally emit minimized bitcode modules, just for use in the thin link step, which for now just strips all debug metadata. I plan to add a cc1 option so this can be invoked easily during the compile step. However, care must be taken to ensure that these minimized thin link bitcode files produce the same index as with the original bitcode files, as these original bitcode files will be used in the backends. Specifically: 1) The module hash used for caching is typically produced by hashing the written bitcode, and we want to include the hash that would correspond to the original bitcode file. This is because we want to ensure that changes in the stripped portions affect caching. Added plumbing to emit the same module hash in the minimized thin link bitcode file. 2) The module paths in the index are constructed from the module ID of each thin linked bitcode, and typically is automatically generated from the input file path. This is the path used for finding the modules to import from, and obviously we need this to point to the original bitcode files. Added gold-plugin support to take a suffix replacement during the thin link that is used to override the identifier on the MemoryBufferRef constructed from the loaded thin link bitcode file. The assumption is that the build system can specify that the minimized bitcode file has a name that is similar but uses a different suffix (e.g. out.thinlink.bc instead of out.o). Added various tests to ensure that we get identical index files out of the thin link step. Reviewers: mehdi_amini, pcc Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D31027 llvm-svn: 298638
* Do not set branch weight if the branch weight annotation is present.Dehao Chen2017-03-231-1/+5
| | | | | | | | | | | | | | Summary: ThinLTO will annotate the CFG twice. If the branch weight is set by the first annotation, we should not set the branch weight again in the second annotation because the first annotation is more accurate as there is less optimization that could affect debug info accuracy. Reviewers: tejohnson, davidxl Reviewed By: tejohnson Subscribers: mehdi_amini, aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D31228 llvm-svn: 298602
* IPO: Const correctness for summaries passed into passes.Peter Collingbourne2017-03-223-59/+77
| | | | | | | | | Pass const qualified summaries into importers and unqualified summaries into exporters. This lets us const-qualify the summary argument to thinBackend. Differential Revision: https://reviews.llvm.org/D31230 llvm-svn: 298534
* IR: Fix a race condition in type id clients of ModuleSummaryIndex.Peter Collingbourne2017-03-222-10/+18
| | | | | | | | | Add a const version of the getTypeIdSummary accessor that avoids mutating the TypeIdMap. Differential Revision: https://reviews.llvm.org/D31226 llvm-svn: 298531
* r286814 resulted that CallPenalty can be subtracted twice:Evgeny Astigeevich2017-03-221-1/+1
| | | | | | | | | | | - First time, during calculation of the cost in InlineCost.cpp - Second time, during calculation of the cost in Inliner.cpp This patches fixes this. Differential Revision: https://reviews.llvm.org/D31137 llvm-svn: 298496
* Do not inline hot callsites for samplepgo in thinlto compile phase.Dehao Chen2017-03-211-2/+6
| | | | | | | | | | | | | | Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase. Reviewers: tejohnson, eraman Reviewed By: tejohnson Subscribers: mehdi_amini, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D31201 llvm-svn: 298428
* Rename AttributeSet to AttributeListReid Kleckner2017-03-215-71/+73
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393
* Revert r298158.Evgeniy Stepanov2017-03-201-2/+42
| | | | | | | | Revert "[asan] Fix dead stripping of globals on Linux." OOM in gold linker. llvm-svn: 298288
* [asan] Fix dead stripping of globals on Linux.Evgeniy Stepanov2017-03-171-42/+2
| | | | | | | | | | | | | | | | | | | | | Use a combination of !associated, comdat, @llvm.compiler.used and custom sections to allow dead stripping of globals and their asan metadata. Sometimes. Currently this works on LLD, which supports SHF_LINK_ORDER with sh_link pointing to the associated section. This also works on BFD, which seems to treat comdats as all-or-nothing with respect to linker GC. There is a weird quirk where the "first" global in each link is never GC-ed because of the section symbols. At this moment it does not work on Gold (as in the globals are never stripped). Differential Revision: https://reviews.llvm.org/D30121 llvm-svn: 298158
* Only unswitch loops with uniform conditionsStanislav Mekhanoshin2017-03-171-2/+3
| | | | | | | | | | | | | | | | | | Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104
* [PM/Inliner] Fix a bug in r297374 where we would leave stale calls inChandler Carruth2017-03-161-0/+6
| | | | | | | the work queue and crash when trying to visit them after deleting the function containing those calls. llvm-svn: 297940
* SamplePGO ThinLTO ICP fix for local functions.Dehao Chen2017-03-142-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | Summary: In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen: 1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName. 2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining: 1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import. 2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName. 3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30754 llvm-svn: 297757
* WholeProgramDevirt: Implement export/import support for VCP.Peter Collingbourne2017-03-101-4/+36
| | | | | | Differential Revision: https://reviews.llvm.org/D30017 llvm-svn: 297503
* WholeProgramDevirt: Implement export/import support for unique ret val opt.Peter Collingbourne2017-03-101-13/+80
| | | | | | Differential Revision: https://reviews.llvm.org/D29917 llvm-svn: 297502
* WholeProgramDevirt: Fixed compilation error under MSVS2015.George Rimar2017-03-101-9/+18
| | | | | | | | | | | | | | | | | | | | | | It was introduced in: r296945 WholeProgramDevirt: Implement exporting for single-impl devirtualization. --------------------- r296939 WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users. --------------------- Microsoft Visual Studio Community 2015 Version 14.0.23107.0 D14REL Does not compile that code without additional brackets, showing multiple error like below: WholeProgramDevirt.cpp(1216): error C2958: the left bracket '[' found at 'c:\access_softek\llvm\lib\transforms\ipo\wholeprogramdevirt.cpp(1216)' was not matched correctly WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ']' before '}' WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ';' before '}' WholeProgramDevirt.cpp(1216): error C2059: syntax error: ']' llvm-svn: 297451
* [PM/Inliner] Make the new PM's inliner process call edges across anChandler Carruth2017-03-091-29/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | entire SCC before iterating on newly-introduced call edges resulting from any inlined function bodies. This more closely matches the behavior of the old PM's inliner. While it wasn't really clear to me initially, this behavior is actually essential to the inliner behaving reasonably in its current design. Because the inliner is fundamentally a bottom-up inliner and all of its cost modeling is designed around that it often runs into trouble within an SCC where we don't have any meaningful bottom-up ordering to use. In addition to potentially cyclic, infinite inlining that we block with the inline history mechanism, it can also take seemingly simple call graph patterns within an SCC and turn them into *insanely* large functions by accidentally working top-down across the SCC without any of the threshold limitations that traditional top-down inliners use. Consider this diabolical monster.cpp file that Richard Smith came up with to help demonstrate this issue: ``` template <int N> extern const char *str; void g(const char *); template <bool K, int N> void f(bool *B, bool *E) { if (K) g(str<N>); if (B == E) return; if (*B) f<true, N + 1>(B + 1, E); else f<false, N + 1>(B + 1, E); } template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); } template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); } extern bool *arr, *end; void test() { f<false, 0>(arr, end); } ``` When compiled with '-DMAX=N' for various values of N, this will create an SCC with a reasonably large number of functions. Previously, the inliner would try to exhaust the inlining candidates in a single function before moving on. This, unfortunately, turns it into a top-down inliner within the SCC. Because our thresholds were never built for that, we will incrementally decide that it is always worth inlining and proceed to flatten the entire SCC into that one function. What's worse, we'll then proceed to the next function, and do the exact same thing except we'll skip the first function, and so on. And at each step, we'll also make some of the constant factors larger, which is awesome. The fix in this patch is the obvious one which makes the new PM's inliner use the same technique used by the old PM: consider all the call edges across the entire SCC before beginning to process call edges introduced by inlining. The result of this is essentially to distribute the inlining across the SCC so that every function incrementally grows toward the inline thresholds rather than allowing the inliner to grow one of the functions vastly beyond the threshold. The code for this is a bit awkward, but it works out OK. We could consider in the future doing something more powerful here such as prioritized order (via lowest cost and/or profile info) and/or a code-growth budget per SCC. However, both of those would require really substantial work both to design the system in a way that wouldn't break really useful abstraction decomposition properties of the current inliner and to be tuned across a reasonably diverse set of code and workloads. It also seems really risky in many ways. I have only found a single real-world file that triggers the bad behavior here and it is generated code that has a pretty pathological pattern. I'm not worried about the inliner not doing an *awesome* job here as long as it does *ok*. On the other hand, the cases that will be tricky to get right in a prioritized scheme with a budget will be more common and idiomatic for at least some frontends (C++ and Rust at least). So while these approaches are still really interesting, I'm not in a huge rush to go after them. Staying even closer to the existing PM's behavior, especially when this easy to do, seems like the right short to medium term approach. I don't really have a test case that makes sense yet... I'll try to find a variant of the IR produced by the monster template metaprogram that is both small enough to be sane and large enough to clearly show when we get this wrong in the future. But I'm not confident this exists. And the behavior change here *should* be unobservable without snooping on debug logging. So there isn't really much to test. The test case updates come from two incidental changes: 1) We now visit functions in an SCC in the opposite order. I don't think there really is a "right" order here, so I just update the test cases. 2) We no longer compute some analyses when an SCC has no call instructions that we consider for inlining. llvm-svn: 297374
* WholeProgramDevirt: Implement importing for uniform ret val opt.Peter Collingbourne2017-03-091-0/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D29854 llvm-svn: 297350
* WholeProgramDevirt: Implement importing for single-impl devirtualization.Peter Collingbourne2017-03-091-11/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D29844 llvm-svn: 297333
* Perform symbol binding for .symver versioned symbolsTeresa Johnson2017-03-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In a .symver assembler directive like: .symver name, name2@@nodename "name2@@nodename" should get the same symbol binding as "name". While the ELF object writer is updating the symbol binding for .symver aliases before emitting the object file, not doing so when the module inline assembly is handled by the RecordStreamer is causing the wrong behavior in *LTO mode. E.g. when "name" is global, "name2@@nodename" must also be marked as global. Otherwise, the symbol is skipped when iterating over the LTO InputFile symbols (InputFile::Symbol::shouldSkip). So, for example, when performing any *LTO via the gold-plugin, the versioned symbol definition is not recorded by the plugin and passed back to the linker. If the object was in an archive, and there were no other symbols needed from that object, the object would not be included in the final link and references to the versioned symbol are undefined. The llvm-lto2 tests added will give an error about an unused symbol resolution without the fix. Reviewers: rafael, pcc Reviewed By: pcc Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D30485 llvm-svn: 297332
* Don't merge global constants with non-dbg metadata.Evgeniy Stepanov2017-03-091-0/+26
| | | | | | | | | | | !type metadata can not be dropped. An alternative to this is adding !type metadata from the replaced globals to the replacement, but that may weaken type tests and make them slower at the same time. The merged global gets !dbg metadata from replaced globals, and can end up with multiple debug locations. llvm-svn: 297327
* Fix one-after-the-end type metadata handling in globalsplit.Evgeniy Stepanov2017-03-071-1/+10
| | | | | | | | | | Itanium ABI may have an address point one byte after the end of a vtable. When such vtable global is split, the !type metadata needs to follow the right vtable. Differential Revision: https://reviews.llvm.org/D30716 llvm-svn: 297236
* Disable gvn-hoist (PR32153)Hans Wennborg2017-03-061-2/+2
| | | | llvm-svn: 297075
* Remove the sample pgo annotation heuristic that uses call count to annotate ↵Dehao Chen2017-03-061-5/+3
| | | | | | | | | | | | | | | | basic block count. Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks. Reviewers: davidxl, aprantl Reviewed By: aprantl Subscribers: llvm-commits, aprantl Differential Revision: https://reviews.llvm.org/D30658 llvm-svn: 297038
* Fix build.Peter Collingbourne2017-03-041-1/+1
| | | | llvm-svn: 296949
* WholeProgramDevirt: Implement exporting for uniform ret val opt.Peter Collingbourne2017-03-041-6/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D29846 llvm-svn: 296948
* WholeProgramDevirt: Implement exporting for single-impl devirtualization.Peter Collingbourne2017-03-041-6/+54
| | | | | | Differential Revision: https://reviews.llvm.org/D29811 llvm-svn: 296945
* WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load ↵Peter Collingbourne2017-03-041-12/+88
| | | | | | | | | | | | | devirtualizations to the list of llvm.type.test users. Any unsuccessful llvm.type.checked.load devirtualizations will be translated into uses of llvm.type.test, so we need to add the resulting llvm.type.test intrinsics to the function summaries so that the LowerTypeTests pass will export them. Differential Revision: https://reviews.llvm.org/D29808 llvm-svn: 296939
* Revert "Re-apply "[GVNHoist] Move GVNHoist to function simplification part ↵Benjamin Kramer2017-03-031-2/+2
| | | | | | | | of pipeline."" This reverts commit r296759. Miscompiles bash. llvm-svn: 296872
* ThinLTOBitcodeWriter: Do not follow operand edges of type GlobalValue when ↵Peter Collingbourne2017-03-021-0/+2
| | | | | | | | | | | | looking for virtual functions. Such edges may otherwise result in infinite recursion if a pointer to a vtable is reachable from the vtable itself. This can happen in practice if a TU defines the ABI types used to implement RTTI, and is itself compiled with RTTI. Fixes PR32121. llvm-svn: 296839
* Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."Geoff Berry2017-03-021-2/+2
| | | | | | | | | This re-applies r289696, which caused TSan perf regression, which has since been addressed in separate changes (see PR for details). See PR31382. llvm-svn: 296759
* Add function importing info from samplepgo profile to the module summary.Dehao Chen2017-02-281-8/+19
| | | | | | | | | | | | | | Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D30053 llvm-svn: 296498
* [OptDiag] Hide legacy remark ctorsAdam Nemet2017-02-231-1/+5
| | | | | | | These are only used when emitting remarks without ORE directly using the free functions emitOptimizationRemark*. llvm-svn: 296037
* Add call branch annotation for ICP promoted direct call in SamplePGO mode.Dehao Chen2017-02-232-4/+7
| | | | | | | | | | | | | | Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic. Reviewers: davidxl, eraman, xur Reviewed By: davidxl, xur Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D30282 llvm-svn: 296028
* Use base discriminator in sample pgo profile matching.Dehao Chen2017-02-231-7/+8
| | | | | | | | | | | | | | Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching. Reviewers: dblaikie, davidxl Reviewed By: dblaikie, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30218 llvm-svn: 295999
* Increases full-unroll threshold.Dehao Chen2017-02-181-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge): Performance: 403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75% Code size: 403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02% The compiler time overhead is similar with code size. Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc Reviewed By: hfinkel, chandlerc Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D28368 llvm-svn: 295538
* OptDiag: Allow constructing DiagnosticLocation from DISubprogramsJustin Bogner2017-02-181-2/+1
| | | | | | | | This avoids creating a DILocation just to represent a line number, since creating Metadata is expensive. Creating a DiagnosticLocation directly is much cheaper. llvm-svn: 295531
* WholeProgramDevirt: For VCP use a 32-bit ConstantInt for the byte offset.Peter Collingbourne2017-02-171-1/+1
| | | | | | | | | | | | | | | A future change will cause this byte offset to be inttoptr'd and then exported via an absolute symbol. On the importing end we will expect the symbol to be in range [0,2^32) so that it will fit into a 32-bit relocation. The problem is that on 64-bit architectures if the offset is negative it will not be in the correct range once we inttoptr it. This change causes us to use a 32-bit integer so that it can be inttoptr'd (which zero extends) into the correct range. Differential Revision: https://reviews.llvm.org/D30016 llvm-svn: 295487
* WholeProgramDevirt: Examine the function body when deciding whether ↵Peter Collingbourne2017-02-171-12/+41
| | | | | | | | | | functions are readnone. The goal is to get an analysis result even for de-refineable functions. Differential Revision: https://reviews.llvm.org/D29803 llvm-svn: 295472
* PMB: Add an importing WPD pass to the start of the ThinLTO backend pipeline.Peter Collingbourne2017-02-151-1/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D30008 llvm-svn: 295260
* Re-apply r295110 and r295144 with a fix for the ASan issue.Peter Collingbourne2017-02-151-98/+156
| | | | llvm-svn: 295241
* Revert r295110 and r295144.Daniel Jasper2017-02-151-156/+98
| | | | | | | This fails under ASAN: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio llvm-svn: 295162
* WholeProgramDevirt: Separate the code that applies optzns from the code that ↵Peter Collingbourne2017-02-151-48/+86
| | | | | | | | | | | decides whether to apply them. NFCI. The idea is that the apply* functions will also be called when importing devirt optimizations. Differential Revision: https://reviews.llvm.org/D29745 llvm-svn: 295144
* WholeProgramDevirt: Change internal vcall data structures to match summary.Peter Collingbourne2017-02-141-74/+94
| | | | | | | | | | | | | | | | | | Group calls into constant and non-constant arguments up front, and use uint64_t instead of ConstantInt to represent constant arguments. The goal is to allow the information from the summary to fit naturally into this data structure in a future change (specifically, it will be added to CallSiteInfo). This has two side effects: - We disallow VCP for constant integer arguments of width >64 bits. - We remove the restriction that the bitwidth of a vcall's argument and return types must match those of the vfunc definitions. I don't expect either of these to matter in practice. The first case is uncommon, and the second one will lead to UB (so we can do anything we like). Differential Revision: https://reviews.llvm.org/D29744 llvm-svn: 295110
* Do not apply redundant LastCallToStaticBonusTaewook Oh2017-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: As written in the comments above, LastCallToStaticBonus is already applied to the cost if Caller has only one user, so it is redundant to reapply the bonus here. If the only user is not a caller, TotalSecondaryCost will not be adjusted anyway because callerWillBeRemoved is false. If there's no caller at all, we don't need to care about TotalSecondaryCost because inliningPreventsSomeOuterInline is false. Reviewers: chandlerc, eraman Reviewed By: eraman Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29169 llvm-svn: 295075
* ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible ↵Peter Collingbourne2017-02-141-13/+79
| | | | | | | | functions to merged module. Differential Revision: https://reviews.llvm.org/D29701 llvm-svn: 295021
* FunctionAttrs: Factor out a function for querying memory access of a ↵Peter Collingbourne2017-02-141-16/+21
| | | | | | | | | | | specific copy of a function. NFC. This will later be used by ThinLTOBitcodeWriter to add copies of readnone functions to the regular LTO module. Differential Revision: https://reviews.llvm.org/D29695 llvm-svn: 295008
* [FunctionAttrs] try to extend nonnull-ness of arguments from a callsite back ↵Sanjay Patel2017-02-131-0/+53
| | | | | | | | | | | | | | | | | | | to its parent function As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html ...we should be able to propagate 'nonnull' info from a callsite back to its parent. The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430), but this won't solve that problem. The transform is currently disabled by default while we wait for clang to work-around potential security problems: http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html Differential Revision: https://reviews.llvm.org/D27855 llvm-svn: 294998
* IR: Type ID summary extensions for WPD; thread summary into WPD pass.Peter Collingbourne2017-02-132-9/+86
| | | | | | | | | | Make the whole thing testable by adding YAML I/O support for the WPD summary information and adding some negative tests that exercise the YAML support. Differential Revision: https://reviews.llvm.org/D29782 llvm-svn: 294981
OpenPOWER on IntegriCloud