| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Mostly keep the existing functions on scalars, but add versions which
also operate based on the vector element size.
llvm-svn: 353430
|
|
|
|
| |
llvm-svn: 353428
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is intentionally a small step because it's hard to know exactly
where we might introduce a conflicting transform with the code that
tries to form wider shuffles. But I think this is safe - if we have
a wide shuffle with 2 operands, then we should do better with an
extract + narrow shuffle.
Differential Revision: https://reviews.llvm.org/D57867
llvm-svn: 353427
|
|
|
|
| |
llvm-svn: 353426
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When adding one thin archive to another, we currently chop off the relative path to the flattened members. For instance, when adding `foo/child.a` (which contains `x.txt`) to `parent.a`, when flattening it we should add it as `foo/x.txt` (which exists) instead of `x.txt` (which does not exist).
As a note, this also undoes the `IsNew` parameter of handling relative paths in r288280. The unit test there still passes.
This was reported as part of testing the kernel build with llvm-ar: https://patchwork.kernel.org/patch/10767545/ (see the second point).
Reviewers: mstorsjo, pcc, ruiu, davide, david2050
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57842
llvm-svn: 353424
|
|
|
|
| |
llvm-svn: 353417
|
|
|
|
| |
llvm-svn: 353416
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When type streams with forward references were merged using GHashes, cycles
were introduced in the debug info. This was caused by
GlobalTypeTableBuilder::insertRecordAs() not inserting the record on the second
pass, thus yielding an empty ArrayRef at that record slot. Later on, upon PDB
emission, TpiStreamBuilder::commit() would skip that empty record, thus
offseting all indices that came after in the stream.
This solution comes in two steps:
1. Fix the hash calculation, by doing a multiple-step resolution, iff there are
forward references in the input stream.
2. Fix merge by resolving with multiple passes, therefore moving records with
forward references at the end of the stream.
This patch also adds support for llvm-readoj --codeview-ghash.
Finally, fix dumpCodeViewMergedTypes() which previously could reference deleted
memory.
Fixes PR40221
Differential Revision: https://reviews.llvm.org/D57790
llvm-svn: 353412
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.
Differential Revision: https://reviews.llvm.org/D55373
llvm-svn: 353403
|
|
|
|
|
|
|
| |
Mark as legal and use the t2* equivalents of the arm mode instructions,
e.g. t2CMPrr instead of plain CMPrr.
llvm-svn: 353392
|
|
|
|
| |
llvm-svn: 353386
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JMP32 instructions has been added to eBPF ISA. They are 32-bit variants of
existing BPF conditional jump instructions, but the comparison happens on
low 32-bit sub-register only, therefore some unnecessary extensions could
be saved.
JMP32 instructions will only be available for -mcpu=v3. Host probe hook has
been updated accordingly.
JMP32 instructions will only be enabled in code-gen when -mattr=+alu32
enabled, meaning compiling the program using sub-register mode.
For JMP32 encoding, it is a new instruction class, and is using the
reserved eBPF class number 0x6.
This patch has been tested by compiling and running kernel bpf selftests
with JMP32 enabled.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 353384
|
|
|
|
|
|
|
| |
When doing 128-bit atomics using CASP we might need to copy a GPRPair to a
different register, but that was unimplemented up to now.
llvm-svn: 353383
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This code tries to handle the case where IBB is an EHPad, but there's an earlier check that uses PBB->hasEHPadSuccessor(). Where PBB is a predecessor of IBB. The hasEHPadSuccessor function would have visited IBB and seen that it was an EHPad and returned false. This would prevent us from reaching this code with IBB as an EHPad.
Looks like this code was originally added in rL37427 (ancient) and made dead in rL143001.
Reviewers: rnk, void, efriedma
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57358
llvm-svn: 353375
|
|
|
|
|
|
|
|
| |
Moved everything SMT-related to LLVM and updated the cmake scripts.
Differential Revision: https://reviews.llvm.org/D54978
llvm-svn: 353373
|
|
|
|
| |
llvm-svn: 353367
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Rather than add a new attribute
See https://github.com/WebAssembly/tool-conventions/issues/64
Subscribers: dschuff, jgravelle-google, aheejin, sunfish, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57864
llvm-svn: 353360
|
|
|
|
|
|
|
|
| |
This comment is old. The code in question was removed in rL203174
Differential Revision: https://reviews.llvm.org/D57856
llvm-svn: 353352
|
|
|
|
| |
llvm-svn: 353341
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MemorySSA.
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.
Reviewers: sanjoy, chandlerc
Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D56625
llvm-svn: 353339
|
|
|
|
| |
llvm-svn: 353338
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instructions in GlobalIsel
Reviewers: aditya_nandakumar, volkan
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, volkan, Petar.Avramovic
Differential Revision: https://reviews.llvm.org/D57630
llvm-svn: 353336
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Pass the alias info to addPointer when available. Will save an alias()
call for must sets when adding a known Must or May alias.
[Part of a series of cleanup patches]
Reviewers: reames, mkazantsev
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D56613
llvm-svn: 353335
|
|
|
|
|
|
|
|
| |
combineExtractWithShuffle may leave a dangling bitcast which may
prevent further optimization in later passes. Avoid constructing it
unless it is used.
llvm-svn: 353333
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since SystemZ supports counting of leading zeros with the FLOGR instruction,
isCheapToSpeculateCtlz() should return true, which it now does.
ISD::CTLZ_ZERO_UNDEF i32 is now handled the same way as ISD::CTLZ is, which
is needed since promotion to i64 is required and CTLZ_ZERO_UNDEF is only
expanded to CTLZ if it is Legal or Custom.
Review: Ulrich Weigand
https://reviews.llvm.org/D57710
llvm-svn: 353330
|
|
|
|
|
|
|
|
|
|
|
| |
As far as I can tell, malloc.h is only being used here to provide
a definition of mallinfo (malloc itself is declared in stdlib.h via
cstdlib). We already have a macro for whether mallinfo is available,
so switch to using that instead.
Differential Revision: https://reviews.llvm.org/D57807
llvm-svn: 353329
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't lower BUILD_VECTORs to BYTE_MASK, but instead expose the BUILD_VECTORs
to the DAGCombiner and select them to VGBM in Select(). This allows the
DAGCombiner to understand the constant vector values.
For floating point, only all-zeros vectors are now generated with VGBM, as it
turned out to be somewhat complicated to handle any arbitrary constants,
while in practice this is very rare and hardly needed.
The SystemZ ISD opcodes z_byte_mask, z_vzero and z_vones have been removed.
Review: Ulrich Weigand
https://reviews.llvm.org/D57152
llvm-svn: 353325
|
|
|
|
|
|
|
|
|
|
| |
Don't repeat the function name in some doxygen
comments.
(Just a minor cleanup, while testing to push
from the git monorepo setup.)
llvm-svn: 353317
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a lot of repeated code wrt unary math intrinsics in
translateKnownIntrinsic. This factors out the repeated MIRBuilder code into
two functions: translateSimpleUnaryIntrinsic and getSimpleUnaryIntrinsicOpcode.
This simplifies adding simple unary intrinsics, since after this, all you have
to do is add the mapping to SimpleUnaryIntrinsicOpcodes.
Differential Revision: https://reviews.llvm.org/D57774
llvm-svn: 353316
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
yaml2obj previously only recognised standard STT_* names, and didn't
allow arbitrary numbers. This change allows the user to specify a number
for the type instead. It also adds a test to verify the existing
behaviour for obj2yaml for unkown symbol types.
Reviewed by: grimar
Differential Revision: https://reviews.llvm.org/D57822
llvm-svn: 353315
|
|
|
|
|
|
|
|
|
|
| |
We should canonicalize to one of these forms,
and compare-with-zero could be more conducive
to follow-on transforms. This also leads to
generally better codegen as shown in PR40611:
https://bugs.llvm.org/show_bug.cgi?id=40611
llvm-svn: 353313
|
|
|
|
|
|
|
|
| |
ARMv8.1a CASP instructions need the first of the pair to be an even register
(otherwise the encoding is unallocated). We enforced this during assembly, but
not CodeGen before.
llvm-svn: 353308
|
|
|
|
|
|
|
| |
Allow custom handling of inline assembly output parameters and add X86
flag parameter support.
llvm-svn: 353307
|
|
|
|
| |
llvm-svn: 353305
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The IPM sequence currently generated to compute the strcmp/memcmp
result will return INT_MIN for the "less than zero" case. While
this is in compliance with the standard, strictly speaking, it
turns out that common applications cannot handle this, e.g. because
they negate a comparison result in order to implement reverse
compares.
This patch changes code to use a different sequence that will result
in -2 for the "less than zero" case (same as GCC). However, this
requires that the two source operands of the compare instructions
are inverted, which breaks the optimization in removeIPMBasedCompare.
Therefore, I've removed this (and all of optimizeCompareInstr), and
replaced it with a mostly equivalent optimization in combineCCMask
at the DAGcombine level.
llvm-svn: 353304
|
|
|
|
|
|
|
|
|
|
|
| |
A quirk of the v8.1a spec is that when the writeback regiser for an atomic
read-modify-write instruction is wzr/xzr, the instruction no longer enforces
acquire ordering. However, it's still written with the misleading 'a' mnemonic.
So this adds an annotation when disassembling such instructions, mentioning the
change.
llvm-svn: 353303
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The proposal in D56796 may cross the line because we're trying to avoid vectorization
transforms in generic DAG combining. So this is an alternate, later, x86-specific
translation of that patch.
There are several potential follow-ups to enhance this:
1. Allow extraction from non-zero element index.
2. Peek through extends of smaller width integers.
3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P
Differential Revision: https://reviews.llvm.org/D56864
llvm-svn: 353302
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a resource unit R is released, the ResourceManager notifies groups that
contain R. Before this patch, the logic in method ResourceManager::release()
implemented a potentially slow iterative search of dependent groups on the
entire set of processor resources.
This patch replaces that logic with a simpler (and often faster) lookup on array
`Resource2Groups`. This patch gives an average speedup of ~3-4% (observed on a
release build when testing for target btver2).
No functional change intended.
llvm-svn: 353301
|
|
|
|
|
|
|
| |
GatherAllAliases only makes sense for LSBaseSDNode. Enforce it with
static typing instead of runtime cast.
llvm-svn: 353291
|
|
|
|
| |
llvm-svn: 353290
|
|
|
|
|
|
|
|
|
|
|
| |
The wrong variable was being used when printing the address increment in
verbose output of .debug_line. This patch fixes this.
Reviewed by: JDevlieghere
Differential Revision: https://reviews.llvm.org/D57693
llvm-svn: 353288
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
llvm-exegesis uses this functionality to read it's benchmark dumps.
This reading of `.yaml`s takes ~60% of runtime for 14656 benchmark points (i.e. one sweep over all x86 instructions),
but only 30% of time for 3x as much benchmark points.
In particular, this `BinaryRef` appears to be an obvious pain point.
Without patch:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-orig.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-orig.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-orig.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-orig.html' (25 runs):
972.86 msec task-clock # 0.994 CPUs utilized ( +- 0.25% )
30 context-switches # 30.774 M/sec ( +- 21.74% )
0 cpu-migrations # 0.370 M/sec ( +- 67.81% )
11873 page-faults # 12211.512 M/sec ( +- 0.00% )
3898373408 cycles # 4009682.186 GHz ( +- 0.25% ) (83.12%)
360399748 stalled-cycles-frontend # 9.24% frontend cycles idle ( +- 0.54% ) (83.24%)
1099450483 stalled-cycles-backend # 28.20% backend cycles idle ( +- 0.59% ) (33.63%)
4910528820 instructions # 1.26 insn per cycle
# 0.22 stalled cycles per insn ( +- 0.13% ) (50.21%)
1111976775 branches # 1143726625.854 M/sec ( +- 0.10% ) (66.77%)
23248474 branch-misses # 2.09% of all branches ( +- 0.19% ) (83.29%)
0.97850 +- 0.00647 seconds time elapsed ( +- 0.66% )
```
With the patch:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
905.29 msec task-clock # 0.999 CPUs utilized ( +- 0.11% )
15 context-switches # 16.533 M/sec ( +- 32.27% )
0 cpu-migrations # 0.000 K/sec
11873 page-faults # 13121.789 M/sec ( +- 0.00% )
3627759720 cycles # 4009283.100 GHz ( +- 0.11% ) (83.19%)
370401480 stalled-cycles-frontend # 10.21% frontend cycles idle ( +- 0.22% ) (83.19%)
1007114438 stalled-cycles-backend # 27.76% backend cycles idle ( +- 0.34% ) (33.62%)
4414014304 instructions # 1.22 insn per cycle
# 0.23 stalled cycles per insn ( +- 0.08% ) (50.36%)
1003751700 branches # 1109314021.971 M/sec ( +- 0.07% ) (66.97%)
24611010 branch-misses # 2.45% of all branches ( +- 0.10% ) (83.41%)
0.90593 +- 0.00105 seconds time elapsed ( +- 0.12% )
```
So this decreases the overall run time of llvm-exegesis analysis mode (on one sweep) by roughly -7%.
To be noted, `BinaryRef::writeAsBinary()` change is the reason for the perf changes,
usage of `llvm::isHexDigit()` instead of `isxdigit()` does not appear to have any perf impact,
i have only changed it "for symmetry".
`writeAsBinary()` change is correct, it produces identical de-hex-ified buffer, and the final output is thus identical:
```
$ sha512sum /tmp/clusters-*
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-new.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-orig.html
```
Reviewers: silvas, espindola, sbc100, zturner, courbet, gchatelet
Reviewed By: gchatelet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57699
llvm-svn: 353282
|
|
|
|
| |
llvm-svn: 353277
|
|
|
|
| |
llvm-svn: 353276
|
|
|
|
| |
llvm-svn: 353275
|
|
|
|
| |
llvm-svn: 353274
|
|
|
|
| |
llvm-svn: 353273
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Follow up to D57082 which moved splitting earlier in the pipeline, in
order to perform it before inlining. However, it was moved too early,
before the IR is annotated with instrumented PGO data. This caused the
splitting to incorrectly determine cold functions.
Move it to just after PGO annotation (still before inlining), in both
pass managers.
Reviewers: vsk, hiraditya, sebpop
Subscribers: mehdi_amini, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57805
llvm-svn: 353270
|
|
|
|
|
|
| |
ranges [NFC]
llvm-svn: 353267
|
|
|
|
|
|
|
|
| |
DomTreeUpdater depends on headers from Analysis, but is in IR. This is a
layering violation since Analysis depends on IR. Relocate this code from IR
to Analysis to fix the layering violation.
llvm-svn: 353265
|