| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
|
|
|
|
|
|
|
|
|
| |
This reverts commit f47d6b38c7a61d50db4566b02719de05492dcef1 (r352791).
Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.
llvm-svn: 352800
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352791
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch avoids an assert in IPConstantPropagation when
there is a argument count/type mismatch between the caller and
the callee.
While this is actually UB on C-level (clang emits a warning),
the IR verifier seems to accept it. I'm not sure what other
frontends/languages might think about this, so simply bailing out
to avoid hitting an assert (in CallSiteBase<>::getArgOperand or
Value::doRAUW) seems like a simple solution.
The problem is exposed by the fact that AbstractCallSites will look
through a bitcast at the callee position of a call/invoke.
Reviewers: jdoerfert, reames, efriedma
Reviewed By: jdoerfert, efriedma
Subscribers: eli.friedman, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D57052
llvm-svn: 352469
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A recent fix to the ThinLTO whole program dead code elimination (D56117)
increased the thin link time on a large MSAN'ed binary by 2x.
It's likely that the time increased elsewhere, but was more noticeable
here since it was already large and ended up timing out.
That change made it so we would repeatedly scan all copies of linkonce
symbols for liveness every time they were encountered during the graph
traversal. This was needed since we only mark one copy of an aliasee as
live when we encounter a live alias. This patch fixes the issue in a
more efficient manner by simply proactively visiting the aliasee (thus
marking all copies live) when we encounter a live alias.
Two notes: One, this requires a hash table lookup (finding the aliasee
summary in the index based on aliasee GUID). However, the impact of this
seems to be small compared to the original pre-D56117 thin link time. It
could be addressed if we keep the aliasee ValueInfo in the alias summary
instead of the aliasee GUID, which I am exploring in a separate patch.
Second, we only populate the aliasee GUID field when reading summaries
from bitcode (whether we are reading individual summaries and merging on
the fly to form the compiled index, or reading in a serialized combined
index). Thankfully, that's currently the only way we can get to this
code as we don't yet support reading summaries from LLVM assembly
directly into a tool that performs the thin link (they must be converted
to bitcode first). I added a FIXME, however I have the fix under test
already. The easiest fix is to simply populate this field always, which
isn't hard, but more likely the change I am exploring to store the
ValueInfo instead as described above will subsume this. I don't want to
hold up the regression fix for this though.
Reviewers: trentxintong
Subscribers: mehdi_amini, inglorion, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D57203
llvm-svn: 352438
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main goal of the model is to avoid *increasing* function size, as
that would eradicate any memory locality benefits from splitting. This
happens when:
- There are too many inputs or outputs to the cold region. Argument
materialization and reloads of outputs have a cost.
- The cold region has too many distinct exit blocks, causing a large
switch to be formed in the caller.
- The code size cost of the split code is less than the cost of a
set-up call.
A secondary goal is to prevent excessive overall binary size growth.
With the cost model in place, I experimented to find a splitting
threshold that works well in practice. To make warm & cold code easily
separable for analysis purposes, I moved split functions to a "cold"
section. I experimented with thresholds between [0, 4] and set the
default to the threshold which minimized geomean __text size.
Experiment data from building LNT+externals for X86 (N = 639 programs,
all sizes in bytes):
| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os** | 1736.3 | 0, n=0 | 10961.6 |
| -Os, thresh=0 | 1740.53 | 124.482, n=134 | 11014 |
| -Os, thresh=1 | 1734.79 | 57.8781, n=90 | 10978.6 |
| -Os, thresh=2 | ** 1733.85 ** | 65.6604, n=61 | 10977.6 |
| -Os, thresh=3 | 1733.85 | 65.3071, n=61 | 10977.6 |
| -Os, thresh=4 | 1735.08 | 67.5156, n=54 | 10965.7 |
| **-Oz** | 1554.4 | 0, n=0 | 10153 |
| -Oz, thresh=2 | ** 1552.2 ** | 65.633, n=61 | 10176 |
| **-O3** | 2563.37 | 0, n=0 | 13105.4 |
| -O3, thresh=2 | ** 2559.49 ** | 71.1072, n=61 | 13162.4 |
Picking thresh=2 reduces the geomean __text section size by 0.14% at
-Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
TEXT size is page-aligned, whereas section sizes are byte-aligned.
Experiment data from building LNT+externals for ARM64 (N = 558 programs,
all sizes in bytes):
| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os** | 1763.96 | 0, n=0 | 42934.9 |
| -Os, thresh=2 | ** 1760.9 ** | 76.6755, n=61 | 42934.9 |
Picking thresh=2 reduces the geomean __text section size by 0.17% at
-Os and causes no growth in the TEXT segment.
Measurements were done with D57082 (r352080) applied.
Differential Revision: https://reviews.llvm.org/D57125
llvm-svn: 352228
|
|
|
|
| |
llvm-svn: 352161
|
|
|
|
|
|
|
|
|
|
|
| |
While a cold invoke itself and its unwind destination can't be
extracted, code which unconditionally executes before/after the invoke
may still be profitable to extract.
With cost model changes from D57125 applied, this gives a 3.5% increase
in split text across LNT+externals on arm64 at -Os.
llvm-svn: 352160
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Performing splitting early has several advantages:
- Inhibiting inlining of cold code early improves code size. Compared
to scheduling splitting at the end of the pipeline, this cuts code
size growth in half within the iOS shared cache (0.69% to 0.34%).
- Inhibiting inlining of cold code improves compile time. There's no
need to inline split cold functions, or to inline as much *within*
those split functions as they are marked `minsize`.
- During LTO, extra work is only done in the pre-link step. Less code
must be inlined during cross-module inlining.
An additional motivation here is that the most common cold regions
identified by the static/conservative splitting heuristic can (a) be
found before inlining and (b) do not grow after inlining. E.g.
__assert_fail, os_log_error.
The disadvantages are:
- Some opportunities for splitting out cold code may be missed. This
gap can potentially be narrowed by adding a worklist algorithm to the
splitting pass.
- Some opportunities to reduce code size may be lost (e.g. store
sinking, when one side of the CFG diamond is split). This does not
outweigh the code size benefits of splitting earlier.
On net, splitting early in the pipeline has substantial code size
benefits, and no major effects on memory locality or performance. We
measured memory locality using ktrace data, and consistently found that
10% fewer pages were needed to capture 95% of text page faults in key
iOS benchmarks. We measured performance on frequency-stabilized iOS
devices using LNT+externals.
This reverses course on the decision made to schedule splitting late in
r344869 (D53437).
Differential Revision: https://reviews.llvm.org/D57082
llvm-svn: 352080
|
|
|
|
|
|
|
|
| |
presence of `noreturn` calls"
This reverts commit cea84ab93aeb079a358ab1c8aeba6d9140ef8b47.
llvm-svn: 352069
|
|
|
|
|
|
|
|
|
| |
This reverts commit a6982414edf315c39ae93f3c3322476217119e99 (llvm-svn: 352036),
because it causes a memory leak in the pass manager. Failing bot
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio
llvm-svn: 352041
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of manually computing DT and PDT, we can get the from the pass
manager, which ideally has them already cached. With the new pass
manager, we could even preserve DT/PDT on a per function basis in a
module pass.
I think this also addresses the TODO about re-using the computed DTs for
BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we
will fetch the cached version later.
Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D57092
llvm-svn: 352036
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`noreturn` calls
Summary:
UBSan wants to detect when unreachable code is actually reached, so it
adds instrumentation before every `unreachable` instruction. However,
the optimizer will remove code after calls to functions marked with
`noreturn`. To avoid this UBSan removes `noreturn` from both the call
instruction as well as from the function itself. Unfortunately, ASan
relies on this annotation to unpoison the stack by inserting calls to
`_asan_handle_no_return` before `noreturn` functions. This is important
for functions that do not return but access the the stack memory, e.g.,
unwinder functions *like* `longjmp` (`longjmp` itself is actually
"double-proofed" via its interceptor). The result is that when ASan and
UBSan are combined, the `noreturn` attributes are missing and ASan
cannot unpoison the stack, so it has false positives when stack
unwinding is used.
Changes:
# UBSan now adds the `expect_noreturn` attribute whenever it removes
the `noreturn` attribute from a function
# ASan additionally checks for the presence of this attribute
Generated code:
```
call void @__asan_handle_no_return // Additionally inserted to avoid false positives
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
unreachable
```
The second call to `__asan_handle_no_return` is redundant. This will be
cleaned up in a follow-up patch.
rdar://problem/40723397
Reviewers: delcypher, eugenis
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D56624
llvm-svn: 352003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Profile sample files include the number of times each entry or inlined
call site is sampled. This is translated into the entry count metadta
on functions.
When sample data is being read, if a call site that was inlined
in the sample program is considered cold and not inlined, then
the entry count of the out-of-line functions does not reflect
the current compilation.
In this patch, we note call sites where the function was not inlined
and as a last action of the sample profile loading, we update the
called function's entry count to reflect the calls from these
call sites which are not included in the profile file.
Reviewers: danielcdh, wmi, Kader, modocache
Reviewed By: wmi
Subscribers: davidxl, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D52845
llvm-svn: 352001
|
|
|
|
| |
llvm-svn: 351945
|
|
|
|
|
|
|
|
|
|
| |
The splitting pass does not need BFI unless the Module actually has a profile
summary. Do not calcualte BFI unless the summary is present.
For the sqlite3 amalgamation, this reduces time spent in the splitting pass
from 0.4% of the total to under 0.1%.
llvm-svn: 351894
|
|
|
|
|
|
|
|
|
|
| |
The splitting pass does not need (post)domtrees until after it's found a
cold block. Defer domtree calculation until a cold block is found.
For the sqlite3 amalgamation, this reduces time spent in the splitting
pass from 0.8% of the total to 0.4%.
llvm-svn: 351892
|
|
|
|
| |
llvm-svn: 351671
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This modification of the currently unused inter-procedural constant
propagation pass (IPConstantPropagation) shows how abstract call sites
enable optimization of callback calls alongside direct and indirect
calls. Through minimal changes, mostly dealing with the partial mapping
of callbacks, inter-procedural constant propagation was enabled for
callbacks, e.g., OpenMP runtime calls or pthreads_create.
Differential Revision: https://reviews.llvm.org/D56447
llvm-svn: 351628
|
|
|
|
|
|
|
|
|
|
|
| |
Thanks to Nikita Popov for pointing out this missed case.
This is a follow-up to r351411, which disabled function merging for
vararg functions outright due to a miscompile (see llvm.org/PR40345).
Differential Revision: https://reviews.llvm.org/D56865
llvm-svn: 351624
|
|
|
|
|
|
|
|
|
|
| |
If an inherently cold function is found, mark it as cold. For now this
means applying the `cold` and `minsize` attributes.
As a drive-by, revisit and clean up the criteria for considering a
function for splitting. Add tests.
llvm-svn: 351623
|
|
|
|
|
|
|
| |
Use the begin/end iterator idiom to avoid visiting split functions,
instead of doing a set lookup.
llvm-svn: 351622
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to r348205, extracting code regions with live output values was
disabled because of a miscompilation (PR39433). Lift the restriction as
PR39433 has been addressed.
Tested on LNT+externals, on a run of check-llvm in a stage2 build, and
with a full build of iOS (with hot/cold splitting enabled).
As a drive-by, remove an errant TODO.
llvm-svn: 351492
|
|
|
|
|
|
|
|
|
|
| |
Resuming exception unwinding is roughly as unlikely as throwing an
exception.
Tested on LNT+externals (in particular, the C++ EH regression tests
provide end-to-end test coverage), as well as with a full build of iOS.
llvm-svn: 351491
|
|
|
|
|
|
|
|
|
| |
Relaxing this requirement creates opportunities to split code dominated
by an EH pad.
Tested on LNT+externals.
llvm-svn: 351483
|
|
|
|
|
|
|
|
| |
This gets rid of the brittle/mysterious calls to @sink()/@sideeffect()
peppered throughout the test cases. They are no longer needed to force
splitting to occur.
llvm-svn: 351480
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the sample profile has no inlining hierachy information included, we call
the sample profile is flattened. For flattened profile, in ThinLTO postlink
phase, SampleProfileLoader's hot function inlining and profile annotation will
do nothing, so it is better to save the effort to read in the profile and run
the sample profile loader pass. It is helpful for reducing compile time when
the flattened profile is huge.
Differential Revision: https://reviews.llvm.org/D54819
llvm-svn: 351476
|
|
|
|
|
|
|
|
| |
Mistaken commit of something still under review!
This reverts commit r351453.
llvm-svn: 351455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).
Also added are the necessary bitcode records, and the corresponding
assembly support.
The index-based WPD will be sent as a follow-on.
Depends on D53890.
Reviewers: pcc
Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54815
llvm-svn: 351453
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function merging pass miscompiles identical vararg functions. The
forwarding thunk it emits doesn't forward the full variable-length list
of arguments. Disable merging for vararg functions for now.
I've filed llvm.org/PR40345 to track the issue.
rdar://47326238
llvm-svn: 351411
|
|
|
|
|
|
| |
PGOInstrGen should be initialized to "" instead of false.
llvm-svn: 351397
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we have pgo options defined in PassManagerBuilder.cpp only for
instrument pgo, but not for sample pgo. We also have pgo options defined
in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
pgo. They have some inconsistency.
To make the options more consistent and make tests writing easier, the
patch let old pass manager to share the same pgo options with new pass
manager in opt, and removes the options in PassManagerBuilder.cpp.
Differential Revision: https://reviews.llvm.org/D56749
llvm-svn: 351392
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Check to make sure that the caller and the callee have compatible
function arguments before promoting arguments. This uses the same
TargetTransformInfo queries that are used to determine if attributes
are compatible for inlining.
The goal here is to avoid breaking ABI when a called function's ABI
depends on a target feature that is not enabled in the caller.
This is a very conservative fix for PR37358. Ideally we would have a more
sophisticated check for ABI compatiblity rather than checking if the
attributes are compatible for inlining.
Reviewers: echristo, chandlerc, eli.friedman, craig.topper
Reviewed By: echristo, chandlerc
Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits
Differential Revision: https://reviews.llvm.org/D53554
llvm-svn: 351296
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
debug location to its function summary.
Summary:
Here are timings (as reported by "opt -time-passes") for
sample-profile pass for some files holding hot functions from a major
service©r. Average 17% reduction. Delta column is 100*(old-new)/old.
```
Old New Delta
0.0537 0.0538 -0.2%
0.8155 0.6522 20.0%
0.0779 0.0751 3.6%
0.0727 0.0913 -25.6%
0.1622 0.1302 19.7%
0.0627 0.0594 5.3%
0.0766 0.0744 2.9%
0.6426 0.4387 31.7%
0.3521 0.2776 21.2%
0.3549 0.2721 23.3%
0.0912 0.0904 0.9%
0.1236 0.1059 14.3%
0.0854 0.0866 -1.4%
0.0757 0.0722 4.6%
0.1293 0.1147 11.3%
0.1354 0.1122 17.1%
0.0767 0.0770 -0.4%
0.1135 0.0968 14.7%
0.0524 0.0608 -16.0%
0.1279 0.1106 13.5%
==========
3.6820 3.0520 17.1% Total
```
Reviewers: twoh, Kader, danielcdh, wmi
Reviewed By: wmi
Subscribers: dblaikie, llvm-commits
Differential Revision: https://reviews.llvm.org/D56435
llvm-svn: 351211
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Like branch instructions, phi nodes frequently do not have debug information related to the block they are in and so they should be ignored.
Reviewers: danielcdh, twoh, Kader, wmi
Reviewed By: wmi
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D55094
llvm-svn: 351102
|
|
|
|
|
|
|
|
|
| |
This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing
changes made to f1309ffebf718d16aec4fab83380556c660e2825.
unintended merge pushed
llvm-svn: 351095
|
|
|
|
| |
llvm-svn: 351092
|
|
|
|
| |
llvm-svn: 351016
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
|
|
|
|
|
|
|
|
|
|
| |
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.
Differential Revision: https://reviews.llvm.org/D56574
llvm-svn: 350939
|
|
|
|
| |
llvm-svn: 350938
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).
Reviewers: davidxl
Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D56464
llvm-svn: 350755
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
minted `CallBase` class instead of the `CallSite` wrapper.
This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.
Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.
Thanks for the review from Saleem!
Differential Revision: https://reviews.llvm.org/D55641
llvm-svn: 350503
|
|
|
|
|
|
|
|
|
|
| |
This has some minor optimizations to shouldBeDeferred. This is not
strictly NFC because the early exit inside the loop assumes
TotalSecondaryCost is monotonically non-decreasing, which is not true if
the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
not go below 0 for the default values of the various options we use.
llvm-svn: 350456
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.
Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.
Reviewers: pcc, davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54507
llvm-svn: 350423
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.
Reviewers: tejohnson, grimar
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D56117
llvm-svn: 350269
|
|
|
|
|
|
|
|
|
|
|
| |
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced
Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.
Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.
Differential Revision: https://reviews.llvm.org/D55785
llvm-svn: 349513
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.
This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.
The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.
Differential Revision: https://reviews.llvm.org/D55716
llvm-svn: 349509
|