| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MemorySSA.
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.
Reviewers: sanjoy, chandlerc
Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D56625
llvm-svn: 353339
|
|
|
|
| |
llvm-svn: 353338
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instructions in GlobalIsel
Reviewers: aditya_nandakumar, volkan
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, volkan, Petar.Avramovic
Differential Revision: https://reviews.llvm.org/D57630
llvm-svn: 353336
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Pass the alias info to addPointer when available. Will save an alias()
call for must sets when adding a known Must or May alias.
[Part of a series of cleanup patches]
Reviewers: reames, mkazantsev
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D56613
llvm-svn: 353335
|
|
|
|
|
|
|
|
| |
combineExtractWithShuffle may leave a dangling bitcast which may
prevent further optimization in later passes. Avoid constructing it
unless it is used.
llvm-svn: 353333
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since SystemZ supports counting of leading zeros with the FLOGR instruction,
isCheapToSpeculateCtlz() should return true, which it now does.
ISD::CTLZ_ZERO_UNDEF i32 is now handled the same way as ISD::CTLZ is, which
is needed since promotion to i64 is required and CTLZ_ZERO_UNDEF is only
expanded to CTLZ if it is Legal or Custom.
Review: Ulrich Weigand
https://reviews.llvm.org/D57710
llvm-svn: 353330
|
|
|
|
|
|
|
|
|
|
|
| |
As far as I can tell, malloc.h is only being used here to provide
a definition of mallinfo (malloc itself is declared in stdlib.h via
cstdlib). We already have a macro for whether mallinfo is available,
so switch to using that instead.
Differential Revision: https://reviews.llvm.org/D57807
llvm-svn: 353329
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't lower BUILD_VECTORs to BYTE_MASK, but instead expose the BUILD_VECTORs
to the DAGCombiner and select them to VGBM in Select(). This allows the
DAGCombiner to understand the constant vector values.
For floating point, only all-zeros vectors are now generated with VGBM, as it
turned out to be somewhat complicated to handle any arbitrary constants,
while in practice this is very rare and hardly needed.
The SystemZ ISD opcodes z_byte_mask, z_vzero and z_vones have been removed.
Review: Ulrich Weigand
https://reviews.llvm.org/D57152
llvm-svn: 353325
|
|
|
|
|
|
|
|
|
|
| |
Don't repeat the function name in some doxygen
comments.
(Just a minor cleanup, while testing to push
from the git monorepo setup.)
llvm-svn: 353317
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a lot of repeated code wrt unary math intrinsics in
translateKnownIntrinsic. This factors out the repeated MIRBuilder code into
two functions: translateSimpleUnaryIntrinsic and getSimpleUnaryIntrinsicOpcode.
This simplifies adding simple unary intrinsics, since after this, all you have
to do is add the mapping to SimpleUnaryIntrinsicOpcodes.
Differential Revision: https://reviews.llvm.org/D57774
llvm-svn: 353316
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
yaml2obj previously only recognised standard STT_* names, and didn't
allow arbitrary numbers. This change allows the user to specify a number
for the type instead. It also adds a test to verify the existing
behaviour for obj2yaml for unkown symbol types.
Reviewed by: grimar
Differential Revision: https://reviews.llvm.org/D57822
llvm-svn: 353315
|
|
|
|
|
|
|
|
|
|
| |
We should canonicalize to one of these forms,
and compare-with-zero could be more conducive
to follow-on transforms. This also leads to
generally better codegen as shown in PR40611:
https://bugs.llvm.org/show_bug.cgi?id=40611
llvm-svn: 353313
|
|
|
|
|
|
|
|
| |
ARMv8.1a CASP instructions need the first of the pair to be an even register
(otherwise the encoding is unallocated). We enforced this during assembly, but
not CodeGen before.
llvm-svn: 353308
|
|
|
|
|
|
|
| |
Allow custom handling of inline assembly output parameters and add X86
flag parameter support.
llvm-svn: 353307
|
|
|
|
| |
llvm-svn: 353305
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The IPM sequence currently generated to compute the strcmp/memcmp
result will return INT_MIN for the "less than zero" case. While
this is in compliance with the standard, strictly speaking, it
turns out that common applications cannot handle this, e.g. because
they negate a comparison result in order to implement reverse
compares.
This patch changes code to use a different sequence that will result
in -2 for the "less than zero" case (same as GCC). However, this
requires that the two source operands of the compare instructions
are inverted, which breaks the optimization in removeIPMBasedCompare.
Therefore, I've removed this (and all of optimizeCompareInstr), and
replaced it with a mostly equivalent optimization in combineCCMask
at the DAGcombine level.
llvm-svn: 353304
|
|
|
|
|
|
|
|
|
|
|
| |
A quirk of the v8.1a spec is that when the writeback regiser for an atomic
read-modify-write instruction is wzr/xzr, the instruction no longer enforces
acquire ordering. However, it's still written with the misleading 'a' mnemonic.
So this adds an annotation when disassembling such instructions, mentioning the
change.
llvm-svn: 353303
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The proposal in D56796 may cross the line because we're trying to avoid vectorization
transforms in generic DAG combining. So this is an alternate, later, x86-specific
translation of that patch.
There are several potential follow-ups to enhance this:
1. Allow extraction from non-zero element index.
2. Peek through extends of smaller width integers.
3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P
Differential Revision: https://reviews.llvm.org/D56864
llvm-svn: 353302
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a resource unit R is released, the ResourceManager notifies groups that
contain R. Before this patch, the logic in method ResourceManager::release()
implemented a potentially slow iterative search of dependent groups on the
entire set of processor resources.
This patch replaces that logic with a simpler (and often faster) lookup on array
`Resource2Groups`. This patch gives an average speedup of ~3-4% (observed on a
release build when testing for target btver2).
No functional change intended.
llvm-svn: 353301
|
|
|
|
|
|
|
| |
GatherAllAliases only makes sense for LSBaseSDNode. Enforce it with
static typing instead of runtime cast.
llvm-svn: 353291
|
|
|
|
| |
llvm-svn: 353290
|
|
|
|
|
|
|
|
|
|
|
| |
The wrong variable was being used when printing the address increment in
verbose output of .debug_line. This patch fixes this.
Reviewed by: JDevlieghere
Differential Revision: https://reviews.llvm.org/D57693
llvm-svn: 353288
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
llvm-exegesis uses this functionality to read it's benchmark dumps.
This reading of `.yaml`s takes ~60% of runtime for 14656 benchmark points (i.e. one sweep over all x86 instructions),
but only 30% of time for 3x as much benchmark points.
In particular, this `BinaryRef` appears to be an obvious pain point.
Without patch:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-orig.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-orig.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-orig.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-orig.html' (25 runs):
972.86 msec task-clock # 0.994 CPUs utilized ( +- 0.25% )
30 context-switches # 30.774 M/sec ( +- 21.74% )
0 cpu-migrations # 0.370 M/sec ( +- 67.81% )
11873 page-faults # 12211.512 M/sec ( +- 0.00% )
3898373408 cycles # 4009682.186 GHz ( +- 0.25% ) (83.12%)
360399748 stalled-cycles-frontend # 9.24% frontend cycles idle ( +- 0.54% ) (83.24%)
1099450483 stalled-cycles-backend # 28.20% backend cycles idle ( +- 0.59% ) (33.63%)
4910528820 instructions # 1.26 insn per cycle
# 0.22 stalled cycles per insn ( +- 0.13% ) (50.21%)
1111976775 branches # 1143726625.854 M/sec ( +- 0.10% ) (66.77%)
23248474 branch-misses # 2.09% of all branches ( +- 0.19% ) (83.29%)
0.97850 +- 0.00647 seconds time elapsed ( +- 0.66% )
```
With the patch:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
905.29 msec task-clock # 0.999 CPUs utilized ( +- 0.11% )
15 context-switches # 16.533 M/sec ( +- 32.27% )
0 cpu-migrations # 0.000 K/sec
11873 page-faults # 13121.789 M/sec ( +- 0.00% )
3627759720 cycles # 4009283.100 GHz ( +- 0.11% ) (83.19%)
370401480 stalled-cycles-frontend # 10.21% frontend cycles idle ( +- 0.22% ) (83.19%)
1007114438 stalled-cycles-backend # 27.76% backend cycles idle ( +- 0.34% ) (33.62%)
4414014304 instructions # 1.22 insn per cycle
# 0.23 stalled cycles per insn ( +- 0.08% ) (50.36%)
1003751700 branches # 1109314021.971 M/sec ( +- 0.07% ) (66.97%)
24611010 branch-misses # 2.45% of all branches ( +- 0.10% ) (83.41%)
0.90593 +- 0.00105 seconds time elapsed ( +- 0.12% )
```
So this decreases the overall run time of llvm-exegesis analysis mode (on one sweep) by roughly -7%.
To be noted, `BinaryRef::writeAsBinary()` change is the reason for the perf changes,
usage of `llvm::isHexDigit()` instead of `isxdigit()` does not appear to have any perf impact,
i have only changed it "for symmetry".
`writeAsBinary()` change is correct, it produces identical de-hex-ified buffer, and the final output is thus identical:
```
$ sha512sum /tmp/clusters-*
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-new.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-orig.html
```
Reviewers: silvas, espindola, sbc100, zturner, courbet, gchatelet
Reviewed By: gchatelet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57699
llvm-svn: 353282
|
|
|
|
| |
llvm-svn: 353277
|
|
|
|
| |
llvm-svn: 353276
|
|
|
|
| |
llvm-svn: 353275
|
|
|
|
| |
llvm-svn: 353274
|
|
|
|
| |
llvm-svn: 353273
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Follow up to D57082 which moved splitting earlier in the pipeline, in
order to perform it before inlining. However, it was moved too early,
before the IR is annotated with instrumented PGO data. This caused the
splitting to incorrectly determine cold functions.
Move it to just after PGO annotation (still before inlining), in both
pass managers.
Reviewers: vsk, hiraditya, sebpop
Subscribers: mehdi_amini, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57805
llvm-svn: 353270
|
|
|
|
|
|
| |
ranges [NFC]
llvm-svn: 353267
|
|
|
|
|
|
|
|
| |
DomTreeUpdater depends on headers from Analysis, but is in IR. This is a
layering violation since Analysis depends on IR. Relocate this code from IR
to Analysis to fix the layering violation.
llvm-svn: 353265
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Delete {} for one-line `let` statements
- Don't indent within `let` blocks
- Add comments after `let` block's closing braces
Reviewers: tlively
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57730
llvm-svn: 353248
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Use a small cache for Values tested by nonEscapingLocalObject().
Since the calls to PointerMayBeCaptured are fairly expensive, this saves
a good amount of compile time for anything relying heavily on
BasicAA.alias() calls.
This uses the same approach as the AliasCache, i.e. the cache is reset
after each alias() call. The cache is not used or updated by modRefInfo
calls since it's harder to know when to reset the cache.
Testcases that show improvements with this patch are too large to
include. Example compile time improvement: 7s to 6s.
Reviewers: chandlerc, sunfish
Subscribers: sanjoy, jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57627
llvm-svn: 353245
|
|
|
|
|
|
|
|
|
| |
Resumes that are not reachable from a cleanup landing pad are considered
to be unreachable. It’s not safe to split them out.
rdar://47808235
llvm-svn: 353242
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A fallible iterator is one whose increment or decrement operations may fail.
This would usually be supported by replacing the ++ and -- operators with
methods that return error:
class MyFallibleIterator {
public:
// ...
Error inc();
Errro dec();
// ...
};
The downside of this style is that it no longer conforms to the C++ iterator
concept, and can not make use of standard algorithms and features such as
range-based for loops.
The fallible_iterator wrapper takes an iterator written in the style above
and adapts it to (mostly) conform with the C++ iterator concept. It does this
by providing standard ++ and -- operator implementations, returning any errors
generated via a side channel (an Error reference passed into the wrapper at
construction time), and immediately jumping the iterator to a known 'end'
value upon error. It also marks the Error as checked any time an iterator is
compared with a known end value and found to be inequal, allowing early exit
from loops without redundant error checking*.
Usage looks like:
MyFallibleIterator I = ..., E = ...;
Error Err = Error::success();
for (auto &Elem : make_fallible_range(I, E, Err)) {
// Loop body is only entered when safe.
// Early exits from loop body permitted without checking Err.
if (SomeCondition)
return;
}
if (Err)
// Handle error.
* Since failure causes a fallible iterator to jump to end, testing that a
fallible iterator is not an end value implicitly verifies that the error is a
success value, and so is equivalent to an error check.
Reviewers: dblaikie, rupprecht
Subscribers: mgorny, dexonsmith, kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57618
llvm-svn: 353237
|
|
|
|
|
|
|
|
|
|
| |
As discussed in D53037, this can lead to worse codegen, and we
don't generally expect the backend to be able to optimize
arbitrary shuffles. If there's only one use of the 1st shuffle,
that means it's getting removed, so that should always be
safe.
llvm-svn: 353235
|
|
|
|
|
|
|
| |
Factored out the code for creating variable for profile file name to
a function.
llvm-svn: 353230
|
|
|
|
|
|
|
|
|
|
|
|
| |
simplify getOperand(i).getReg()
https://reviews.llvm.org/D57608
It's a common pattern in GISel to have a MachineInstrBuilder from which we get various regs
(commonly MIB->getOperand(0).getReg()). This adds a helper method and the above can be
replaced with MIB.getReg(0).
llvm-svn: 353223
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Before r349976, MC ignored such directives when producing an object file
and asserted when re-producing textual assembly output. I turned this
assertion into a hard error in both cases in r349976, but this makes it
unnecessarily difficult to write a single assembly file that supports
both MachO and other object formats that support .file. A user reported
this as PR40578, and we decided to go back to ignoring the directive.
Fixes PR40578
Reviewers: mstorsjo
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57772
llvm-svn: 353218
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The lowering is identical to the memcpy lowering.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57727
llvm-svn: 353216
|
|
|
|
|
|
| |
Regroup supported and unsupported functions by precision and C standard.
llvm-svn: 353213
|
|
|
|
| |
llvm-svn: 353209
|
|
|
|
|
|
|
|
| |
Ensure the XOR in the waterfall loop for indirect addressing is considered a terminator.
Differential Revision: https://reviews.llvm.org/D57703
llvm-svn: 353207
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
According to
https://docs.nvidia.com/cuda/archive/10.0/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf,
the compiler should emit the DW_AT_address_class attribute for all
variable and parameter. It means, that DW_AT_address_class attribute
should be used in the non-standard way to support compatibility with the
cuda-gdb debugger.
Clang is able to generate the information about the variable address
class. This information is emitted as the expression sequence
`DW_OP_constu <DWARF Address Space> DW_OP_swap DW_OP_xderef`. The patch
tries to find all such expressions and transform them into
`DW_AT_address_class <DWARF Address Space>` if target is NVPTX and the debugger is gdb.
If the expression is not found, then default values are used. For the
local variables <DWARF Address Space> is set to ADDR_local_space(6), for
the globals <DWARF Address Space> is set to ADDR_global_space(5). The
values are taken from the table in the same section 5.2. CUDA-Specific
DWARF Definitions.
Reviewers: echristo, probinson
Subscribers: jholewinski, aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D57157
llvm-svn: 353203
|
|
|
|
|
|
|
|
|
| |
The v2i64 argument is lowered to a bitcast of v4i32 build_vector.
This would then attempt to use the i32-element as the source of the
vector truncate. This really would need to collect 2 elements from the
build_vector to produce the intended truncate.
llvm-svn: 353202
|
|
|
|
|
|
| |
rL352997 enabled ZERO_EXTEND from non-shuffle-able value types. I've disabled it for now to fix a regression identified by @asbirlea until I can fix this properly.
llvm-svn: 353198
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the standard gauntlet of accessors for global indirect functions and updates the echo test.
Now it would be nice to have a target abstraction so one could know if they have access to a suitable ELF linker and runtime.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56177
llvm-svn: 353193
|
|
|
|
|
|
|
|
| |
Patch by Kristina Bessonova!
Differential Revision: https://reviews.llvm.org/D56787
llvm-svn: 353192
|
|
|
|
|
|
|
|
|
|
|
| |
We can't outline BTI instructions, because they need to be the very first
instruction executed after an indirect call or branch. If we outline them, then
an indirect call might go to the branch to the outlined function, which will
fault.
Differential revision: https://reviews.llvm.org/D57753
llvm-svn: 353190
|
|
|
|
| |
llvm-svn: 353189
|