| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
| |
and later.
For v4i8 and v8i8 when the reduction starts with a load we end up
shifting the data in the scalar domain and copying to the vector
domain a second time using a broadcast.
We already copied it to the vector domain once. It's better to
just shuffle it there.
llvm-svn: 368544
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support -march=tigerlake for x86.
Compare with Icelake Client, It include 4 more new features ,they are
avx512vp2intersect, movdiri, movdir64b, shstk.
Patch by Xiang Zhang (xiangzhangllvm)
Differential Revision: https://reviews.llvm.org/D65840
llvm-svn: 368543
|
|
|
|
|
|
| |
Expected to address buildbot failure http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/16285 caused by D65060.
llvm-svn: 368542
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: RKSimon, craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66053
llvm-svn: 368541
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
After the commits that changed x86 backend to widen vectors
instead of using promotion some of our downstream tests
started to fail. It was noticed that WidenVectorResult has
been missing support for SMULFIX/UMULFIX/SMULFIXSAT. This
patch adds the missing functionality.
Reviewers: craig.topper, RKSimon
Reviewed By: craig.topper
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66051
llvm-svn: 368540
|
|
|
|
|
|
|
|
| |
See PR40840
Differential Revision: https://reviews.llvm.org/D66059
llvm-svn: 368539
|
|
|
|
|
|
|
| |
If we have SSE2 we can handle any i8/i16 type and let
type legalization deal with it.
llvm-svn: 368538
|
|
|
|
|
|
|
| |
Target independent type legalization and custom lowering
should be able to handle it.
llvm-svn: 368537
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
one redundant call site
After r367869, VER_NDX_LOCAL can only be assigned to Defined and
CommonSymbol. CommonSymbol becomes Defined after replaceCommonSymbols(),
thus `versionId == VER_NDX_LOCAL` will imply `isDefined()`.
In maybeReportUndefined(), computeBinding() is called when the symbol is
unknown to be Undefined. computeBinding() != STB_LOCAL will always be
true.
llvm-svn: 368536
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
!isPreemptible was added in r343668 to fix PR39104: symbols redefined by
replaceWithDefined() might be incorrectly considered STB_LOCAL if a
version script specified `local: *;`.
After r367869 (`config->defaultSymbolVersion` was removed), we will
assign VER_NDX_LOCAL to only regular Defined and CommonSymbol, not
Defined created by replaceWithDefined() (because scanVersionScript() is
called before scanRelocations()). The !isPreemptible is thus redundant
and can be deleted.
llvm-svn: 368535
|
|
|
|
| |
llvm-svn: 368534
|
|
|
|
|
|
|
| |
`Symbol::used` is used by Undefined and SharedSymbol to record if a
.symtab entry is needed. It is of no use for Defined.
llvm-svn: 368533
|
|
|
|
|
|
|
|
| |
MachineBlockPlacement::optimizeBranches()
This will pass EXPENSIVE check.
llvm-svn: 368532
|
|
|
|
| |
llvm-svn: 368531
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the nature of the beat system in the MVE architecture, along with tail
predication and low-overhead loops, unrolling has less benefit compared to
normal loops. You can not, for example, hide the latency of a load with other
instructions as you can for scalar code. Preventing unrolling also makes the
code easier to read and reason about.
So if a loop contains vector code, don't enable the runtime unrolling. At least
for the time being.
Differential Revision: https://reviews.llvm.org/D65803
llvm-svn: 368530
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With enough codegen complete, we can now correctly report the number and size
of vector registers for MVE, allowing auto vectorisation. This also allows FP
auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE
FP arithmetic and parity between scalar and vector FP behaviour.
Patch by David Sherwood.
Differential Revision: https://reviews.llvm.org/D63728
llvm-svn: 368529
|
|
|
|
|
|
| |
initialization chains
llvm-svn: 368528
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When exceptions are repeatedly thrown in the middle of handling another
exception, we call `__clang_call_terminate` with the exception pointer
(i32) as an argument. But in case of foreign exceptions, we don't have
the pointer, so we call the function with 0. (This requires
`__clang_call_terminate` can deal with 0 argument, which will be done
later)
But previously the 0 argument was not added as a `i32.const 0` but an
immediate by mistake, causing the `call` instruction to take not an i32
but rather an exnref, because an `exnref` is left on top of the value
stack if `br_on_exn` is not taken.
```
block i32
br_on_exn 0, __cpp_exception
;; exnref is on top of stack now
i32.const 0 ;; This was missing!
call __clang_call_terminate
unreachable
end
call __clang_call_terminate ;; This takes i32 extracted by br_on_exn
```
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65475
llvm-svn: 368527
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Hoisting/sinking instruction out of a loop isn't always beneficial. Hoisting an instruction from a cold block inside a loop body out of the loop could hurt performance. This change makes Loop ICM profile aware - it now checks block frequency to make sure hoisting/sinking anly moves instruction to colder block.
Test Plan:
ninja check
Reviewers: asbirlea, sanjoy, reames, nikic, hfinkel, vsk
Reviewed By: asbirlea
Subscribers: fhahn, vsk, davidxl, xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65060
llvm-svn: 368526
|
|
|
|
|
|
| |
This reverts commit ad92a4a2769425ad0d39ac1dbb6282f6f51a1af7.
llvm-svn: 368525
|
|
|
|
| |
llvm-svn: 368524
|
|
|
|
|
|
| |
with widening legalization.
llvm-svn: 368523
|
|
|
|
|
|
|
|
|
|
|
| |
with widening legalization.
The test case that changed is probably better served through
allowing combineTruncatedArithmetic to create narrow vectors. It
also appears InstCombine would have simplified this test case
to remove the zext and trunc anyway.
llvm-svn: 368522
|
|
|
|
|
|
| |
SimplifyBinOp(Instruction::BinaryOps::Add, )
llvm-svn: 368521
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
truncated shl (PR42399)
trunc-of-shl:
https://rise4fun.com/Alive/zGx
https://rise4fun.com/Alive/sl0L
I.e. no extra legality check needed.
https://bugs.llvm.org/show_bug.cgi?id=42399
llvm-svn: 368520
|
|
|
|
|
|
|
|
|
|
| |
when shifting constant
If one of the values being shifted is a constant, since the new shift
amount is known-constant, the new shift will end up being constant-folded
so, we don't need that one-use restriction then.
llvm-svn: 368519
|
|
|
|
|
|
|
|
|
|
| |
restriction
That one-use restriction is not needed for correctness - we have already
ensured that one of the shifts will go away, so we know we won't increase
the instruction count. So there is no need for that restriction.
llvm-svn: 368518
|
|
|
|
|
|
| |
shift of const
llvm-svn: 368517
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Because the dynamic linker for 32-bit executables on 64-bit FreeBSD uses
the environment variable `LD_32_LIBRARY_PATH` instead of
`LD_LIBRARY_PATH` to find needed dynamic libraries, running the 32-bit
parts of the dynamic ASan tests will fail with errors similar to:
```
ld-elf32.so.1: Shared object "libclang_rt.asan-i386.so" not found, required by "Asan-i386-inline-Dynamic-Test"
```
This adds support for setting up `LD_32_LIBRARY_PATH` for the unit and
regression tests. It will likely also require a minor change to the
`TestingConfig` class in `llvm/utils/lit/lit`.
Reviewers: emaste, kcc, rnk, arichardson
Reviewed By: arichardson
Subscribers: kubamracek, krytarowski, fedor.sergeev, delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D65772
llvm-svn: 368516
|
|
|
|
|
|
|
|
|
|
| |
On SSE41+ targets we always lower vector shuffles to ZERO_EXTEND_VECTOR_INREG, even if we don't need the extended bits.
This patch relaxes this so that we lower to ANY_EXTEND_VECTOR_INREG if we can, meaning that shuffle combines have a better idea of what elements need to be kept zero. This helps the multiple reduction code as we can now combine away a lot more of the pack+extend codes.
Differential Revision: https://reviews.llvm.org/D65741
llvm-svn: 368515
|
|
|
|
|
|
| |
MachineBlockPlacement::optimizeBranches()
llvm-svn: 368514
|
|
|
|
|
|
| |
- Replace the previous 32-bit shift with 64-bit one matching `OpInit`.
llvm-svn: 368513
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an extension of a transform that tries to produce positive floating-point
constants to improve canonicalization (and hopefully lead to more reassociation
and CSE).
The original patches were:
D4904
D5363 (rL221721)
But as the test diffs show, these were limited to basic patterns by walking from
an instruction to its single user rather than recursively moving up the def-use
sequence. No fast-math is required here because we're only rearranging implicit
FP negations in intermediate ops.
A motivating bug is:
https://bugs.llvm.org/show_bug.cgi?id=32939
Differential Revision: https://reviews.llvm.org/D65954
llvm-svn: 368512
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Our IR rewriting infrastructure currently fails when it encounters a variable which has no metadata associated.
This causes dynamic_cast to fail as in this case IRForTarget considers the type info pointers ('@_ZTI...') to be
variables without associated metadata. As there are no variables for these internal variables, this is actually
not an error and dynamic_cast would work fine if we didn't throw this error.
This patch fixes this by removing this diagnostics code. In case we would actually hit a variable that has no
metadata (but is supposed to have), we still have the error in the expression log so this shouldn't make it
harder to diagnose any missing metadata errors.
This patch should fix dynamic_cast and also adds a bunch of test coverage to that language feature.
Fixes rdar://10813639
Reviewers: davide, labath
Reviewed By: labath
Subscribers: friss, labath, abidh, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D65932
llvm-svn: 368511
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The signature "Geode by NSC" for NSC vendor is wrong.
In lib/Headers/cpuid.h, signature_NSC_edx and signature_NSC_ecx constants are inverted (cpuid signature order is ebx # edx # ecx).
Reviewers: teemperor, rsmith, craig.topper
Reviewed By: teemperor, craig.topper
Subscribers: craig.topper, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D65978
llvm-svn: 368510
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
blocks
Summary:
In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D63972
llvm-svn: 368509
|
|
|
|
|
|
| |
Update modulemap with a new textual header.
llvm-svn: 368508
|
|
|
|
|
|
|
|
| |
See PR40840
Differential Revision: https://reviews.llvm.org/D65925
llvm-svn: 368507
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isn't legal
Summary:
This patch adds a special DAG combine for SSE1 to recognize the IR pattern InstCombine gives us for movmsk. This only does the recognition for a few cases where its obvious the input won't be scalarized resulting in building a vector just do to the movmsk. I've made it separate from our existing matching for movmsk since that's called in multiple places and I didn't spend time to see if the other callers would make sense here. Plus the restrictions and additional checks would complicate that.
This fixes the case from PR42870. Buts its probably still broken the presence of logic ops feeding the movmsk pattern which would further hide the v4f32 type.
Reviewers: spatel, RKSimon, xbolva00
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65689
llvm-svn: 368506
|
|
|
|
|
|
| |
vpermil2pd/ps. Only allow MCConstantExprs.
llvm-svn: 368505
|
|
|
|
|
|
|
|
|
|
|
|
| |
and disabling it forAndroid.
Reviewers: krytarowski, vitalybuka
Reviewed By: krytarowski
Differential Revision: https://reviews.llvm.org/D66027
llvm-svn: 368504
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
On windows if the frame size exceed 4096 bytes, compiler need to
generate a call to _alloca_probe. X86CallFrameOptimization pass
changes the reserved stack size and cause of stack probe function
not be inserted. This patch fix the issue by detecting the call
frame size, if the size exceed 4096 bytes, drop X86CallFrameOptimization.
Reviewers: craig.topper, wxiao3, annita.zhang, rnk, RKSimon
Reviewed By: rnk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65923
llvm-svn: 368503
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MemoryDependenceAnalysis
Introducing non-global control for default block-scan-limit in MemDep analysis.
Useful when there are many compilations per initialized LLVM instance (e.g. JIT).
Reviewed By: asbirlea
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65806
llvm-svn: 368502
|
|
|
|
| |
llvm-svn: 368501
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I split out the "extract parent instead of this" logic from the "this isn't
worth extracting" logic (now in eligibleForExtraction()), because I found it
hard to reason about.
While here, handle overloaded as well as builtin assignment operators.
Also this uncovered a bug in getCallExpr() which I fixed.
Reviewers: SureYeaah
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D65337
llvm-svn: 368500
|
|
|
|
|
|
| |
annotations"
llvm-svn: 368499
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
clangd supports a -j option to limit the amount of threads to use for parsing
TUs. However, when using -background-index (the default in later versions of
clangd), the parallelism used by clangd defaults to the hardware_parallelisn,
i.e. number of physical cores.
On shared hardware environments, with large projects, this can significantly
affect performance with no way to tune it down.
This change makes the -j parameter apply equally to parsing and background
index. It's not perfect, because the total number of threads is 2x the -j value,
which may still be unexpected. But at least this change allows users to prevent
clangd using all CPU cores.
Reviewers: kadircet, sammccall
Reviewed By: sammccall
Subscribers: javed.absar, jfb, sammccall, ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D66031
llvm-svn: 368498
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D66034
llvm-svn: 368497
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This fixes lldb build on macOS SDK prior to 10.12.
Reviewers: JDevlieghere
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D66034
llvm-svn: 368496
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The default behavior of Clang's indirect function call checker will replace
the address of each CFI-checked function in the output file's symbol table
with the address of a jump table entry which will pass CFI checks. We refer
to this as making the jump table `canonical`. This property allows code that
was not compiled with ``-fsanitize=cfi-icall`` to take a CFI-valid address
of a function, but it comes with a couple of caveats that are especially
relevant for users of cross-DSO CFI:
- There is a performance and code size overhead associated with each
exported function, because each such function must have an associated
jump table entry, which must be emitted even in the common case where the
function is never address-taken anywhere in the program, and must be used
even for direct calls between DSOs, in addition to the PLT overhead.
- There is no good way to take a CFI-valid address of a function written in
assembly or a language not supported by Clang. The reason is that the code
generator would need to insert a jump table in order to form a CFI-valid
address for assembly functions, but there is no way in general for the
code generator to determine the language of the function. This may be
possible with LTO in the intra-DSO case, but in the cross-DSO case the only
information available is the function declaration. One possible solution
is to add a C wrapper for each assembly function, but these wrappers can
present a significant maintenance burden for heavy users of assembly in
addition to adding runtime overhead.
For these reasons, we provide the option of making the jump table non-canonical
with the flag ``-fno-sanitize-cfi-canonical-jump-tables``. When the jump
table is made non-canonical, symbol table entries point directly to the
function body. Any instances of a function's address being taken in C will
be replaced with a jump table address.
This scheme does have its own caveats, however. It does end up breaking
function address equality more aggressively than the default behavior,
especially in cross-DSO mode which normally preserves function address
equality entirely.
Furthermore, it is occasionally necessary for code not compiled with
``-fsanitize=cfi-icall`` to take a function address that is valid
for CFI. For example, this is necessary when a function's address
is taken by assembly code and then called by CFI-checking C code. The
``__attribute__((cfi_jump_table_canonical))`` attribute may be used to make
the jump table entry of a specific function canonical so that the external
code will end up taking a address for the function that will pass CFI checks.
Fixes PR41972.
Differential Revision: https://reviews.llvm.org/D65629
llvm-svn: 368495
|