| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
llvm-svn: 340619
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.
An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.
For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive
It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
define void @test1() {
call void @bar(i1 true) #0
call void @bar(i1 false) #2
ret void
}
attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
..
attributes #2 = { "inline-remark"="(cost=never): recursive" }
Patch by: yrouban (Yevgeny Rouban)
Reviewers: xbolva00, tejohnson, apilipenko
Reviewed By: xbolva00, tejohnson
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D50435
llvm-svn: 340618
|
|
|
|
|
|
|
|
|
|
|
|
| |
These changes expand the FunctionAttr logic in order to mark functions as
WriteOnly when appropriate. This is done through an additional bool variable
and extended logic.
Reviewers: hfinkel, jdoerfert
Differential Revision: https://reviews.llvm.org/D48387
llvm-svn: 340537
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
This version of the patch fixes cleaning up ssa_copy intrinsics, so it does not
crash for instructions in blocks that have been marked unreachable.
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 340525
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the option for the printing of summary information about functions
considered but rejected for importing during the thin link.
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D50881
llvm-svn: 340047
|
|
|
|
|
|
| |
Increment existing NumInlined and NumDeleted stats in InlinerPass::run.
llvm-svn: 339682
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When WPD is performed in a ThinLTO backend, the function may be created
if it isn't already in that module. Module::getOrInsertFunction may
add a bitcast, in which case the returned Constant is not a Function and
doesn't have a name. Invoke stripPointerCasts() on the returned value
where we access its name.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49959
llvm-svn: 339640
|
|
|
|
|
|
|
| |
My previous change moved some code upwards which caused an assert in debug mode
because the global value didn't necessarily have an initializer. Don't do that.
llvm-svn: 339485
|
|
|
|
|
|
| |
Sanitizers seem unhappy.
llvm-svn: 339480
|
|
|
|
|
|
| |
This makes my upcoming patch much easier to read.
llvm-svn: 339478
|
|
|
|
|
|
| |
This makes the code easier to read and will make an upcoming patch I have easier to review because that patch needed this refactoring to reuse some of the functions.
llvm-svn: 339391
|
|
|
|
|
|
| |
It was always false, which is obviously wrong.
llvm-svn: 339390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inalloca parameter has to be the only parameter passed in memory.
Changing the convention to fastcc can break that.
At some point we should teach global opt how to optimize ABI attributes
like inalloca and maybe byval. These attributes are mainly used to match
C ABIs. They are harder for LLVM to optimize and they don't always
generate the best code.
Fixes PR38487
llvm-svn: 339360
|
|
|
|
|
|
|
|
|
|
| |
Summary: DenseMap's operator[] performs an insertion if the entry isn't found. The second phase of ConstantMerge isn't trying to insert anything: it's just looking to see if the first phased performed an insertion. Use find instead, avoiding insertion of every single global initializer in the map of constants. This has the side-effect of making all entries in CMap non-null (because only global declarations would have null initializers, and that would be a bug).
Subscribers: dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D50476
llvm-svn: 339309
|
|
|
|
| |
llvm-svn: 339021
|
|
|
|
| |
llvm-svn: 339020
|
|
|
|
| |
llvm-svn: 339019
|
|
|
|
| |
llvm-svn: 338986
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338969
|
|
|
|
|
|
|
| |
- It's possible for 'Changed' to return as false even if we did
partial inline something. Fixed to accumulate return values
llvm-svn: 338896
|
|
|
|
|
|
|
|
| |
This patch just extract code into a separate function to remove some
duplication between the old and new pass manager pipeline. Due to the
different CGSCC iterators used, not all code duplication was eliminated.
llvm-svn: 338585
|
|
|
|
| |
llvm-svn: 338496
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338494
|
|
|
|
| |
llvm-svn: 338389
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338387
|
|
|
|
|
|
|
| |
This reverts commit r338240 because it was causing OOMs on the UBSan
buildbot when building clang/lib/Sema/SemaChecking.cpp
llvm-svn: 338297
|
|
|
|
|
|
| |
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}
llvm-svn: 338293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
My initial motivation for this came from https://reviews.llvm.org/D48122,
where it was pointed out that my change didn't fit well in SimplifyCFG and
therefore using GVNHoist was a better way to go. GVNHoist has been disabled
for a while as there was a list of bugs related to it.
I have fixed the following bugs:
https://bugs.llvm.org/show_bug.cgi?id=37808 -> https://reviews.llvm.org/D48372 (rL337149)
https://bugs.llvm.org/show_bug.cgi?id=36787 -> https://reviews.llvm.org/D49555 (rL337674)
https://bugs.llvm.org/show_bug.cgi?id=37445 -> https://reviews.llvm.org/D49425 (rL337680)
The next two bugs no longer occur, and it's unclear which commit fixed them:
https://bugs.llvm.org/show_bug.cgi?id=36635
https://bugs.llvm.org/show_bug.cgi?id=37791
I investigated this one and proved to be unrelated to GVNHoist, but a genuine bug in NewGvn:
https://bugs.llvm.org/show_bug.cgi?id=37660
To convince myself GVNHoist is in a good state I made a successful bootstrap build of LLVM.
Merging this change now in order to make it to the LLVM 7.0.0 branch.
Differential Revision: https://reviews.llvm.org/D49858
llvm-svn: 338240
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now, from clang, can turn arrays of
static short g_data[] = {16, 16, 16, 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 0, 0};
into structs of the form
@g_data = internal global <{ [8 x i16], [8 x i16] }> ...
GlobalOpt will incorrectly SROA it, not realising that the access to the first
element may overflow into the second. This fixes it by checking geps more
thoroughly.
I believe this makes the globalsra-partial.ll test case invalid as the %i value
could be out of bounds. I've re-purposed it as a negative test for this case.
Differential Revision: https://reviews.llvm.org/D49816
llvm-svn: 338192
|
|
|
|
|
|
|
|
| |
instructions.
I suspect it is causing the clang-stage2-Rthinlto failures.
llvm-svn: 337956
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
r337828 resolves a PredicateInfo issue with unnamed types.
Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
llvm-svn: 337904
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Without this change, the WholeProgramDevirt pass, which requires the
TargetLibraryInfo, will construct one from the default triple.
Fixes PR38139.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49278
llvm-svn: 337750
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(previously it was 128-byte)
We tested different cap values with a recent commit of Chromium. Our results show that the 32-byte cap yields the smallest binary and all the caps yield similar performance.
Based on the results, we propose to change the cap value to 32-byte.
Patch by Zhaomo Yang!
Differential Revision: https://reviews.llvm.org/D49405
llvm-svn: 337622
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Enable these passes for CFI and WPD in ThinLTO and LTO with the new pass
manager. Add a couple of tests for both PMs based on the clang tests
tools/clang/test/CodeGen/thinlto-distributed-cfi*.ll, but just test
through llvm-lto2 and not with distributed ThinLTO.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49429
llvm-svn: 337461
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r337081, therefore restoring r337050 (and fix in
r337059), with test fix for bot failure described after the original
description below.
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.
Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).
There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.
I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.
(I tried a few other variations of the change but they also didn't
improve time or peak memory).
The new commit removes a test that no longer makes sense
(Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the
reverse-iteration bot. The test depends on the order of processing the
summary call edges, and actually depended on the old problematic
behavior of selecting more than one summary for a given GUID when
encountered with different thresholds. There was no guarantee even
before that we would eventually pick the linkonce copy with the hottest
call edges, it just happened to work with the test and the old code, and
there was no guarantee that we would end up importing the selected
version of the copy that had the hottest call edges (since the backend
would effectively import only one of the selected copies).
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D48670
llvm-svn: 337184
|
|
|
|
|
|
|
| |
This reverts commits r337050 and r337059. Caused failure in
reverse-iteration bot that needs more investigation.
llvm-svn: 337081
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.
Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).
There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.
I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.
(I tried a few other variations of the change but they also didn't
improve time or peak memory).
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D48670
llvm-svn: 337050
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently LowerTypeTests emits jumptable entries for all live external
and address-taken functions; however, we could limit the number of
functions that we emit entries for significantly.
For Cross-DSO CFI, we continue to emit jumptable entries for all
exported definitions. In the non-Cross-DSO CFI case, we only need to
emit jumptable entries for live functions that are address-taken in live
functions. This ignores exported functions and functions that are only
address taken in dead functions. This change uses ThinLTO summary data
(now emitted for all modules during ThinLTO builds) to determine
address-taken and liveness info.
The logic for emitting jumptable entries is more conservative in the
regular LTO case because we don't have summary data in the case of
monolithic LTO builds; however, once summaries are emitted for all LTO
builds we can unify the Thin/monolithic LTO logic to only use summaries
to determine the liveness of address taking functions.
This change is a partial fix for PR37474. It reduces the build size for
nacl_helper by ~2-3%, the reduction is due to nacl_helper compiling in
lots of unused code and unused functions that are address taken in dead
functions no longer being being considered live due to emitted jumptable
references. The reduction for chromium is ~0.1-0.2%.
Reviewers: pcc, eugenis, javed.absar
Reviewed By: pcc
Subscribers: aheejin, dexonsmith, dschuff, mehdi_amini, eraman, steven_wu, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47652
llvm-svn: 337038
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I noticed that the .imports files emitted for distributed ThinLTO
backends do not have consistent ordering. This is because StringMap
iteration order is not guaranteed to be deterministic. Since we already
have a std::map with this information, used when emitting the individual
index files (ModuleToSummariesForIndex), use it for the imports files as
well.
This issue is likely causing some unnecessary rebuilds of the ThinLTO
backends in our distributed build system as the imports files are inputs
to those backends.
Reviewers: pcc, steven_wu, mehdi_amini
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D48783
llvm-svn: 336721
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
state for it.
Summary:
Tests: 10
Metric: compile_time
Program unpatch-result patch-result diff
Bullet/bullet 32.39 30.54 -5.7%
SPASS/SPASS 18.14 17.25 -4.9%
mafft/pairlocalalign 12.10 11.64 -3.8%
ClamAV/clamscan 19.21 19.63 2.2%
7zip/7zip-benchmark 49.55 48.85 -1.4%
kimwitu++/kc 15.68 15.87 1.2%
lencod/lencod 21.13 21.34 1.0%
consumer-typeset/consumer-typeset 13.65 13.62 -0.2%
tramp3d-v4/tramp3d-v4 29.88 29.92 0.1%
sqlite3/sqlite3 18.48 18.46 -0.1%
unpatch-result patch-result diff
count 10.000000 10.000000 10.000000
mean 23.022000 22.712400 -0.011671
std 11.362831 11.094183 0.027338
min 12.104000 11.640000 -0.057298
25% 16.299000 16.214000 -0.032282
50% 18.844000 19.048000 -0.001350
75% 27.689000 27.774000 0.007752
max 49.552000 48.852000 0.021861
I also tested only this pass by concatenating all the code from the
llvm/lib/Analysis/ folder and do clang -g followed by opt. I get close to 20% speedup
for the pass. I expect a majority of the gain come from skipping the dbg intrinsics.
Before patch (opt -time-passes -called-value-propagation):
============
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 3.8303 seconds (3.8279 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
Name ---
2.0768 ( 57.3%) 0.0990 ( 48.0%) 2.1757 ( 56.8%) 2.1757 ( 56.8%) Bitcode
Writer
0.8444 ( 23.3%) 0.0600 ( 29.1%) 0.9044 ( 23.6%) 0.9044 ( 23.6%) Called
Value Propagation
0.7031 ( 19.4%) 0.0472 ( 22.9%) 0.7502 ( 19.6%) 0.7478 ( 19.5%) Module
Verifier
3.6242 (100.0%) 0.2062 (100.0%) 3.8303 (100.0%) 3.8279 (100.0%) Total
After patch (opt -time-passes -called-value-propagation):
============
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 3.6605 seconds (3.6579 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
Name ---
2.0716 ( 59.7%) 0.0990 ( 52.5%) 2.1705 ( 59.3%) 2.1706 ( 59.3%) Bitcode
Writer
0.7144 ( 20.6%) 0.0300 ( 15.9%) 0.7444 ( 20.3%) 0.7444 ( 20.4%) Called
Value Propagation
0.6859 ( 19.8%) 0.0596 ( 31.6%) 0.7455 ( 20.4%) 0.7429 ( 20.3%) Module
Verifier
3.4719 (100.0%) 0.1886 (100.0%) 3.6605 (100.0%) 3.6579 (100.0%) Total
Reviewers: davide, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49078
llvm-svn: 336551
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder Loop
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 336062
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and diretory.
Also cleans up all the associated naming to be consistent and removes
the public access to the pass ID which was unused in LLVM.
Also runs clang-format over parts that changed, which generally cleans
up a bunch of formatting.
This is in preparation for doing some internal cleanups to the pass.
Differential Revision: https://reviews.llvm.org/D47352
llvm-svn: 336028
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The InlinerFunctionImportStats will collect and dump stats regarding how
many function inlined into the module were imported by ThinLTO.
Reviewers: wmi, dexonsmith
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D48729
llvm-svn: 335914
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Rather than just print the GUID, when it is available in the index,
print the global name as well in the function import thin link debug
messages. Names will be available when the combined index is being
built by the same process, e.g. a linker or "llvm-lto2 run".
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits
Differential Revision: https://reviews.llvm.org/D48612
llvm-svn: 335760
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a function has sample to use, but cannot use them because of no debug
information, currently a warning will be issued to inform the missing
opportunity.
This warning assumes the binary generating the profile and the binary using
the profile are similar enough. It is not always the case. Sometimes even
if the binaries are not quite similar, we may still get some benefit by
using sampleFDO. In those cases, we may still want to apply sampleFDO but
not want to see a lot of such warnings pop up.
The patch adds an option for the warning.
Differential Revision: https://reviews.llvm.org/D48510
llvm-svn: 335484
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).
See https://reviews.llvm.org/D34156 for details on the original change.
This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.
llvm-svn: 335385
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first pass in the main pipeline to use the legacy PM's
ability to run function analyses "on demand". Unfortunately, it turns
out there are bugs in that somewhat-hacky approach. At the very least,
it leaks memory and doesn't support -debug-pass=Structure. Unclear if
there are larger issues or not, but this should get the sanitizer bots
back to green by fixing the memory leaks.
llvm-svn: 335320
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for generating a call graph profile from Branch Frequency Info.
The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.
After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
!llvm.module.flags = !{!0}
!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
Differential Revision: https://reviews.llvm.org/D48105
llvm-svn: 335306
|
|
|
|
|
|
|
|
|
|
|
| |
facts from cmp instructions."
This reverts commit r335206.
As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.
llvm-svn: 335272
|