| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57494
llvm-svn: 352771
|
|
|
|
|
|
|
|
| |
leader"
Recommit of r352763 with fix for use after free.
llvm-svn: 352770
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r352763.
Causing a couple bot failures, root cause pointed to by sanitizer bot:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/28909/steps/annotate/logs/stdio
Use after free. I understand the issue but will revert and test with fix
before recommitting.
llvm-svn: 352768
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
COFF requires that COMDAT name match that of the leader. When we promote
and rename an internal leader in ThinLTO due to an import, ensure we
subsequently rename the associated COMDAT. Similar to D31963 which did
this during ThinLTO module splitting.
Fixes PR40414.
Reviewers: pcc, inglorion
Subscribers: mehdi_amini, dexonsmith, dmajor, llvm-commits
Differential Revision: https://reviews.llvm.org/D57395
llvm-svn: 352763
|
|
|
|
|
|
|
|
|
|
|
| |
Introduces a pass that provides default lowering strategy for the
`experimental.widenable.condition` intrinsic, replacing all its uses with
`i1 true`.
Differential Revision: https://reviews.llvm.org/D56096
Reviewed By: reames
llvm-svn: 352739
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functions
Summary: This patch enables folding following expressions under -ffast-math flag: exp(X) * exp(Y) -> exp(X + Y), exp2(X) * exp2(Y) -> exp2(X + Y). Motivation: https://bugs.llvm.org/show_bug.cgi?id=35594
Reviewers: hfinkel, spatel, efriedma, lebedev.ri
Reviewed By: spatel, lebedev.ri
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D41342
llvm-svn: 352730
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.
rdar://32212419
Differential revision: https://reviews.llvm.org/D56761
llvm-svn: 352664
|
|
|
|
|
|
|
|
|
|
| |
The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones.
This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output. This is due to order of transforms w/in instcombine resulting in two slightly different fixed points. That's something we should fix, but isn't a problem w/this patch per se.
Differential Revision: https://reviews.llvm.org/D57398
llvm-svn: 352653
|
|
|
|
| |
llvm-svn: 352621
|
|
|
|
| |
llvm-svn: 352619
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Check the bool value of the attribute in getOptionalBoolLoopAttribute
not just its existance.
Eliminates the warning noise generated when vectorization is explicitly disabled.
Reviewers: Meinersbur, hfinkel, dmgreen
Subscribers: jlebar, sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D57260
llvm-svn: 352555
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm circling back around to a loose end from D51929.
The backend (either CGP or DAG) doesn't recognize this pattern, so we end up with different asm for these IR variants.
Regardless of any future changes to canonicalize to saturation/overflow intrinsics, we want to get raw IR variations
into the minimal number of raw IR forms. If/when we can canonicalize to intrinsics, that will make that step easier.
Pre: C2 == ~C1
%a = add i32 %x, C1
%c = icmp ugt i32 %x, C2
%r = select i1 %c, i32 -1, i32 %a
=>
%a = add i32 %x, C1
%c2 = icmp ult i32 %x, C2
%r = select i1 %c2, i32 %a, i32 -1
https://rise4fun.com/Alive/pkH
Differential Revision: https://reviews.llvm.org/D57352
llvm-svn: 352536
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch avoids an assert in IPConstantPropagation when
there is a argument count/type mismatch between the caller and
the callee.
While this is actually UB on C-level (clang emits a warning),
the IR verifier seems to accept it. I'm not sure what other
frontends/languages might think about this, so simply bailing out
to avoid hitting an assert (in CallSiteBase<>::getArgOperand or
Value::doRAUW) seems like a simple solution.
The problem is exposed by the fact that AbstractCallSites will look
through a bitcast at the callee position of a call/invoke.
Reviewers: jdoerfert, reames, efriedma
Reviewed By: jdoerfert, efriedma
Subscribers: eli.friedman, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D57052
llvm-svn: 352469
|
|
|
|
|
|
|
|
| |
GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations
Differential Revision: https://reviews.llvm.org/D57177
llvm-svn: 352440
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A recent fix to the ThinLTO whole program dead code elimination (D56117)
increased the thin link time on a large MSAN'ed binary by 2x.
It's likely that the time increased elsewhere, but was more noticeable
here since it was already large and ended up timing out.
That change made it so we would repeatedly scan all copies of linkonce
symbols for liveness every time they were encountered during the graph
traversal. This was needed since we only mark one copy of an aliasee as
live when we encounter a live alias. This patch fixes the issue in a
more efficient manner by simply proactively visiting the aliasee (thus
marking all copies live) when we encounter a live alias.
Two notes: One, this requires a hash table lookup (finding the aliasee
summary in the index based on aliasee GUID). However, the impact of this
seems to be small compared to the original pre-D56117 thin link time. It
could be addressed if we keep the aliasee ValueInfo in the alias summary
instead of the aliasee GUID, which I am exploring in a separate patch.
Second, we only populate the aliasee GUID field when reading summaries
from bitcode (whether we are reading individual summaries and merging on
the fly to form the compiled index, or reading in a serialized combined
index). Thankfully, that's currently the only way we can get to this
code as we don't yet support reading summaries from LLVM assembly
directly into a tool that performs the thin link (they must be converted
to bitcode first). I added a FIXME, however I have the fix under test
already. The easiest fix is to simply populate this field always, which
isn't hard, but more likely the change I am exploring to store the
ValueInfo instead as described above will subsume this. I don't want to
hold up the regression fix for this though.
Reviewers: trentxintong
Subscribers: mehdi_amini, inglorion, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D57203
llvm-svn: 352438
|
|
|
|
|
|
|
| |
When passing a `swifterror` argument or alloca as an input to an
extraction region, mark the input parameter `swifterror`.
llvm-svn: 352408
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If MemorySSA is avaiable, we can skip checking all instructions if block has any Defs.
(volatile loads are also Defs).
We still need to check all instructions for "canThrow", even if no Defs are found.
Reviewers: chandlerc
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D57129
llvm-svn: 352393
|
|
|
|
| |
llvm-svn: 352241
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Set default value for retrieved attributes to 1, since the check is against 1.
Eliminates the warning noise generated when the attributes are not present.
Reviewers: sanjoy
Subscribers: jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D57253
llvm-svn: 352238
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main goal of the model is to avoid *increasing* function size, as
that would eradicate any memory locality benefits from splitting. This
happens when:
- There are too many inputs or outputs to the cold region. Argument
materialization and reloads of outputs have a cost.
- The cold region has too many distinct exit blocks, causing a large
switch to be formed in the caller.
- The code size cost of the split code is less than the cost of a
set-up call.
A secondary goal is to prevent excessive overall binary size growth.
With the cost model in place, I experimented to find a splitting
threshold that works well in practice. To make warm & cold code easily
separable for analysis purposes, I moved split functions to a "cold"
section. I experimented with thresholds between [0, 4] and set the
default to the threshold which minimized geomean __text size.
Experiment data from building LNT+externals for X86 (N = 639 programs,
all sizes in bytes):
| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os** | 1736.3 | 0, n=0 | 10961.6 |
| -Os, thresh=0 | 1740.53 | 124.482, n=134 | 11014 |
| -Os, thresh=1 | 1734.79 | 57.8781, n=90 | 10978.6 |
| -Os, thresh=2 | ** 1733.85 ** | 65.6604, n=61 | 10977.6 |
| -Os, thresh=3 | 1733.85 | 65.3071, n=61 | 10977.6 |
| -Os, thresh=4 | 1735.08 | 67.5156, n=54 | 10965.7 |
| **-Oz** | 1554.4 | 0, n=0 | 10153 |
| -Oz, thresh=2 | ** 1552.2 ** | 65.633, n=61 | 10176 |
| **-O3** | 2563.37 | 0, n=0 | 13105.4 |
| -O3, thresh=2 | ** 2559.49 ** | 71.1072, n=61 | 13162.4 |
Picking thresh=2 reduces the geomean __text section size by 0.14% at
-Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
TEXT size is page-aligned, whereas section sizes are byte-aligned.
Experiment data from building LNT+externals for ARM64 (N = 558 programs,
all sizes in bytes):
| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os** | 1763.96 | 0, n=0 | 42934.9 |
| -Os, thresh=2 | ** 1760.9 ** | 76.6755, n=61 | 42934.9 |
Picking thresh=2 reduces the geomean __text section size by 0.17% at
-Os and causes no growth in the TEXT segment.
Measurements were done with D57082 (r352080) applied.
Differential Revision: https://reviews.llvm.org/D57125
llvm-svn: 352228
|
|
|
|
|
|
|
|
|
|
|
|
| |
2nd part of D57095 with the same reason, just in another place. We never
fold branches that are not immediately in the current loop, but this check
is missing in `IsEdgeLive` As result, it may think that the edge in subloop is
dead while it's live. It's a pessimization in the current stance.
Differential Revision: https://reviews.llvm.org/D57147
Reviewed By: rupprecht
llvm-svn: 352170
|
|
|
|
| |
llvm-svn: 352161
|
|
|
|
|
|
|
|
|
|
|
| |
While a cold invoke itself and its unwind destination can't be
extracted, code which unconditionally executes before/after the invoke
may still be profitable to extract.
With cost model changes from D57125 applied, this gives a 3.5% increase
in split text across LNT+externals on arm64 at -Os.
llvm-svn: 352160
|
|
|
|
|
|
|
|
|
|
|
|
| |
block.
Otherwise they are treated as dynamic allocas, which ends up increasing
code size significantly. This reduces size of Chromium base_unittests
by 2MB (6.7%).
Differential Revision: https://reviews.llvm.org/D57205
llvm-svn: 352152
|
|
|
|
| |
llvm-svn: 352098
|
|
|
|
| |
llvm-svn: 352093
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MemorySSA needs updating each time an instruction is moved.
LICM and control flow hoisting re-hoists instructions, thus needing another update when re-moving those instructions.
Pending cleanup: the MSSA update is duplicated, should be moved inside moveInstructionBefore.
Reviewers: jnspaulsson
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D57176
llvm-svn: 352092
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Performing splitting early has several advantages:
- Inhibiting inlining of cold code early improves code size. Compared
to scheduling splitting at the end of the pipeline, this cuts code
size growth in half within the iOS shared cache (0.69% to 0.34%).
- Inhibiting inlining of cold code improves compile time. There's no
need to inline split cold functions, or to inline as much *within*
those split functions as they are marked `minsize`.
- During LTO, extra work is only done in the pre-link step. Less code
must be inlined during cross-module inlining.
An additional motivation here is that the most common cold regions
identified by the static/conservative splitting heuristic can (a) be
found before inlining and (b) do not grow after inlining. E.g.
__assert_fail, os_log_error.
The disadvantages are:
- Some opportunities for splitting out cold code may be missed. This
gap can potentially be narrowed by adding a worklist algorithm to the
splitting pass.
- Some opportunities to reduce code size may be lost (e.g. store
sinking, when one side of the CFG diamond is split). This does not
outweigh the code size benefits of splitting earlier.
On net, splitting early in the pipeline has substantial code size
benefits, and no major effects on memory locality or performance. We
measured memory locality using ktrace data, and consistently found that
10% fewer pages were needed to capture 95% of text page faults in key
iOS benchmarks. We measured performance on frequency-stabilized iOS
devices using LNT+externals.
This reverses course on the decision made to schedule splitting late in
r344869 (D53437).
Differential Revision: https://reviews.llvm.org/D57082
llvm-svn: 352080
|
|
|
|
|
|
|
|
| |
presence of `noreturn` calls"
This reverts commit cea84ab93aeb079a358ab1c8aeba6d9140ef8b47.
llvm-svn: 352069
|
|
|
|
|
|
|
|
| |
After submitting https://reviews.llvm.org/D57138, I realized it was slightly more conservative than needed. The scalar indices don't appear to be a problem on a vector gep, we even had a test for that.
Differential Revision: https://reviews.llvm.org/D57161
llvm-svn: 352061
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an alternative to https://reviews.llvm.org/D57103. After discussion, we dedicided to check this in as a temporary workaround, and pursue a true fix under the original thread.
The issue at hand is that the base rewriting algorithm doesn't consider the fact that GEPs can turn a scalar input into a vector of outputs. We had handling for scalar GEPs and fully vector GEPs (i.e. all vector operands), but not the scalar-base + vector-index forms. A true fix here requires treating GEP analogously to extractelement or shufflevector.
This patch is merely a workaround. It simply hides the crash at the cost of some ugly code gen for this presumable very rare pattern.
Differential Revision: https://reviews.llvm.org/D57138
llvm-svn: 352059
|
|
|
|
|
|
|
|
|
| |
This reverts commit a6982414edf315c39ae93f3c3322476217119e99 (llvm-svn: 352036),
because it causes a memory leak in the pass manager. Failing bot
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio
llvm-svn: 352041
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of manually computing DT and PDT, we can get the from the pass
manager, which ideally has them already cached. With the new pass
manager, we could even preserve DT/PDT on a per function basis in a
module pass.
I think this also addresses the TODO about re-using the computed DTs for
BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we
will fetch the cached version later.
Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D57092
llvm-svn: 352036
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we choose whether or not we should mark block as dead, we have an
inconsistent logic in markup of live blocks.
- We take candidate IF its terminator branches on constant AND it is immediately
in current loop;
- We mark successor live IF its terminator doesn't branch by constant OR it branches
by constant and the successor is its always taken block.
What we are missing here is that when the terminator branches on a constant but is
not taken as a candidate because is it not immediately in the current loop, we will
mark only one (always taken) successor as live. Therefore, we do NOT do the actual
folding but may NOT mark one of the successors as live. So the result of markup is
wrong in this case, and we may then hit various asserts.
Thanks Jordan Rupprech for reporting this!
Differential Revision: https://reviews.llvm.org/D57095
Reviewed By: rupprecht
llvm-svn: 352024
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`noreturn` calls
Summary:
UBSan wants to detect when unreachable code is actually reached, so it
adds instrumentation before every `unreachable` instruction. However,
the optimizer will remove code after calls to functions marked with
`noreturn`. To avoid this UBSan removes `noreturn` from both the call
instruction as well as from the function itself. Unfortunately, ASan
relies on this annotation to unpoison the stack by inserting calls to
`_asan_handle_no_return` before `noreturn` functions. This is important
for functions that do not return but access the the stack memory, e.g.,
unwinder functions *like* `longjmp` (`longjmp` itself is actually
"double-proofed" via its interceptor). The result is that when ASan and
UBSan are combined, the `noreturn` attributes are missing and ASan
cannot unpoison the stack, so it has false positives when stack
unwinding is used.
Changes:
# UBSan now adds the `expect_noreturn` attribute whenever it removes
the `noreturn` attribute from a function
# ASan additionally checks for the presence of this attribute
Generated code:
```
call void @__asan_handle_no_return // Additionally inserted to avoid false positives
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
unreachable
```
The second call to `__asan_handle_no_return` is redundant. This will be
cleaned up in a follow-up patch.
rdar://problem/40723397
Reviewers: delcypher, eugenis
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D56624
llvm-svn: 352003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Profile sample files include the number of times each entry or inlined
call site is sampled. This is translated into the entry count metadta
on functions.
When sample data is being read, if a call site that was inlined
in the sample program is considered cold and not inlined, then
the entry count of the out-of-line functions does not reflect
the current compilation.
In this patch, we note call sites where the function was not inlined
and as a last action of the sample profile loading, we update the
called function's entry count to reflect the calls from these
call sites which are not included in the profile file.
Reviewers: danielcdh, wmi, Kader, modocache
Reviewed By: wmi
Subscribers: davidxl, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D52845
llvm-svn: 352001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match
similar APIs. Also changed its behavior to copy over the other
discriminator components, instead of eliding them.
Renamed cloneWithDuplicationFactor to
cloneByMultiplyingDuplicationFactor, which more closely matches what
this API does.
Reviewers: dblaikie, wmi
Reviewed By: dblaikie
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D56220
llvm-svn: 351996
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VPlan-native path
Context: Patch Series #2 for outer loop vectorization support in LV
using VPlan. (RFC:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
Patch series #2 checks that inner loops are still trivially lock-step
among all vector elements. Non-loop branches are blindly assumed as
divergent.
Changes here implement VPlan based predication algorithm to compute
predicates for blocks that need predication. Predicates are computed
for the VPLoop region in reverse post order. A block's predicate is
computed as OR of the masks of all incoming edges. The mask for an
incoming edge is computed as AND of predecessor block's predicate and
either predecessor's Condition bit or NOT(Condition bit) depending on
whether the edge from predecessor block to the current block is true
or false edge.
Reviewers: fhahn, rengolin, hsaito, dcaballe
Reviewed By: fhahn
Patch by Satish Guggilla, thanks!
Differential Revision: https://reviews.llvm.org/D53349
llvm-svn: 351990
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This saves a cbz+cold call in the interceptor ABI, as well as a realign
in both ABIs, trading off a dcache entry against some branch predictor
entries and some code size.
Unfortunately the functionality is hidden behind a flag because ifunc is
known to be broken on static binaries on Android.
Differential Revision: https://reviews.llvm.org/D57084
llvm-svn: 351989
|
|
|
|
| |
llvm-svn: 351945
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch relaxes restrictions on types of latch condition and range check.
In current implementation, they should match. This patch allows to handle
wide range checks against narrow condition. The motivating example is the
following:
int N = ...
for (long i = 0; (int) i < N; i++) {
if (i >= length) deopt;
}
In this patch, the option that enables this support is turned off by
default. We'll wait until it is switched to true.
Differential Revision: https://reviews.llvm.org/D56837
Reviewed By: reames
llvm-svn: 351926
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each hwasan check requires emitting a small piece of code like this:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses
The problem with this is that these code blocks typically bloat code
size significantly.
An obvious solution is to outline these blocks of code. In fact, this
has already been implemented under the -hwasan-instrument-with-calls
flag. However, as currently implemented this has a number of problems:
- The functions use the same calling convention as regular C functions.
This means that the backend must spill all temporary registers as
required by the platform's C calling convention, even though the
check only needs two registers on the hot path.
- The functions take the address to be checked in a fixed register,
which increases register pressure.
Both of these factors can diminish the code size effect and increase
the performance hit of -hwasan-instrument-with-calls.
The solution that this patch implements is to involve the aarch64
backend in outlining the checks. An intrinsic and pseudo-instruction
are created to represent a hwasan check. The pseudo-instruction
is register allocated like any other instruction, and we allow the
register allocator to select almost any register for the address to
check. A particular combination of (register selection, type of check)
triggers the creation in the backend of a function to handle the check
for specifically that pair. The resulting functions are deduplicated by
the linker. The pseudo-instruction (really the function) is specified
to preserve all registers except for the registers that the AAPCS
specifies may be clobbered by a call.
To measure the code size and performance effect of this change, I
took a number of measurements using Chromium for Android on aarch64,
comparing a browser with inlined checks (the baseline) against a
browser with outlined checks.
Code size: Size of .text decreases from 243897420 to 171619972 bytes,
or a 30% decrease.
Performance: Using Chromium's blink_perf.layout microbenchmarks I
measured a median performance regression of 6.24%.
The fact that a perf/size tradeoff is evident here suggests that
we might want to make the new behaviour conditional on -Os/-Oz.
But for now I've enabled it unconditionally, my reasoning being that
hwasan users typically expect a relatively large perf hit, and ~6%
isn't really adding much. We may want to revisit this decision in
the future, though.
I also tried experimenting with varying the number of registers
selectable by the hwasan check pseudo-instruction (which would result
in fewer variants being created), on the hypothesis that creating
fewer variants of the function would expose another perf/size tradeoff
by reducing icache pressure from the check functions at the cost of
register pressure. Although I did observe a code size increase with
fewer registers, I did not observe a strong correlation between the
number of registers and the performance of the resulting browser on the
microbenchmarks, so I conclude that we might as well use ~all registers
to get the maximum code size improvement. My results are below:
Regs | .text size | Perf hit
-----+------------+---------
~all | 171619972 | 6.24%
16 | 171765192 | 7.03%
8 | 172917788 | 5.82%
4 | 177054016 | 6.89%
Differential Revision: https://reviews.llvm.org/D56954
llvm-svn: 351920
|
|
|
|
|
|
|
|
|
|
| |
The splitting pass does not need BFI unless the Module actually has a profile
summary. Do not calcualte BFI unless the summary is present.
For the sqlite3 amalgamation, this reduces time spent in the splitting pass
from 0.4% of the total to under 0.1%.
llvm-svn: 351894
|
|
|
|
|
|
|
|
|
|
| |
The splitting pass does not need (post)domtrees until after it's found a
cold block. Defer domtree calculation until a cold block is found.
For the sqlite3 amalgamation, this reduces time spent in the splitting
pass from 0.8% of the total to 0.4%.
llvm-svn: 351892
|
|
|
|
|
|
| |
This is still causing compilation crashes in some targets. Will follow up shortly with a repro.
llvm-svn: 351845
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support of guards expressed as branches by widenable
conditions in Loop Predication.
Differential Revision: https://reviews.llvm.org/D56081
Reviewed By: reames
llvm-svn: 351805
|
|
|
|
| |
llvm-svn: 351794
|
|
|
|
|
|
|
|
| |
Deopt operands are generally intended to record information about a site in code with minimal perturbation of the surrounding code. Idiomatically, they also tend to appear down rare paths. Putting these together, we have an obvious case for extending CVP w/deopt operand constant folding. Arguably, we should be doing this for all operands on all instructions, but that's definitely a much larger and risky change.
Differential Revision: https://reviews.llvm.org/D55678
llvm-svn: 351774
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in https://bugs.llvm.org/show_bug.cgi?id=36651, the specialization for
isPodLike<std::pair<...>> did not match the expectation of
std::is_trivially_copyable which makes the memcpy optimization invalid.
This patch renames the llvm::isPodLike trait into llvm::is_trivially_copyable.
Unfortunately std::is_trivially_copyable is not portable across compiler / STL
versions. So a portable version is provided too.
Note that the following specialization were invalid:
std::pair<T0, T1>
llvm::Optional<T>
Tests have been added to assert that former specialization are respected by the
standard usage of llvm::is_trivially_copyable, and that when a decent version
of std::is_trivially_copyable is available, llvm::is_trivially_copyable is
compared to std::is_trivially_copyable.
As of this patch, llvm::Optional is no longer considered trivially copyable,
even if T is. This is to be fixed in a later patch, as it has impact on a
long-running bug (see r347004)
Note that GCC warns about this UB, but this got silented by https://reviews.llvm.org/D50296.
Differential Revision: https://reviews.llvm.org/D54472
llvm-svn: 351701
|
|
|
|
|
|
|
|
| |
This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp.
Noticed while cleaning up vector integer comparison costs for PR40376.
llvm-svn: 351697
|