| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.
See D59688 for context.
Reviewers: andreadb, lebedev.ri
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59872
llvm-svn: 357171
|
| |
|
|
|
|
|
| |
Also improve the check for SALU instructions to also ignore
implicit_def and other fake instructions.
llvm-svn: 357170
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on llvm-exegesis measurements.
Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter,
it is now possible to continue cleanup of the scheduler model.
With this, there are no more latency inconsistencies for the
opcodes that produce stable measurements, and only a few inconsistencies
for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis
measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.)
llvm-svn: 357169
|
| |
|
|
|
|
| |
To avoid more spurious clang-format changes when adding features (D59872).
llvm-svn: 357168
|
| |
|
|
|
|
|
|
|
|
|
|
| |
build_vector(truncate(x),truncate(y))
If scalar truncates are free, attempt to pre-truncate build_vectors source operands.
Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering.
Differential Revision: https://reviews.llvm.org/D59654
llvm-svn: 357161
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
symbols.
yaml2obj/obj2yaml does not support the symbols with STB_GNU_UNIQUE yet.
Currently, obj2yaml fails with llvm_unreachable when met such a symbol.
I faced it when investigated the https://bugs.llvm.org/show_bug.cgi?id=41196.
Differential revision: https://reviews.llvm.org/D59875
llvm-svn: 357158
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
-asan-detect-invalid-pointer-sub options.
This is in preparation to a driver patch to add gcc 8's -fsanitize=pointer-compare and -fsanitize=pointer-subtract.
Disabled by default as this is still an experimental feature.
Reviewed By: morehouse, vitalybuka
Differential Revision: https://reviews.llvm.org/D59220
llvm-svn: 357157
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change, the VPlan native path is triggered with the directive:
#pragma clang loop vectorize(enable)
There is no need to specify the vectorize_width(N) clause.
Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>
Differential Revision: https://reviews.llvm.org/D57598
llvm-svn: 357156
|
| |
|
|
|
|
|
|
| |
Now that D59484 has landed its easier to add these.
Added missing AVX512BW v32i16 equivalents while I was at it.
llvm-svn: 357155
|
| |
|
|
|
|
|
| |
G_STORE for 1-bit values uses a STRBi12, which stores the whole byte.
Zero out the undefined bits before writing.
llvm-svn: 357154
|
| |
|
|
|
|
|
|
|
|
|
| |
G_SELECT uses a 1-bit scalar for the condition, and is currently
implemented with a plain CMPri against 0. This means that values such as
0x1110 are interpreted as true, when instead the higher bits should be
treated as undefined and therefore ignored. Replace the CMPri with a
TSTri against 0x1, which performs an implicit AND, yielding the expected
result.
llvm-svn: 357153
|
| |
|
|
|
|
|
|
|
|
|
| |
Straightforward port of StatepointIRVerifier pass to new Pass Manager framework.
Reviewers: fedor.sergeev, reames
Reviewed By: fedor.sergeev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D59825
llvm-svn: 357147
|
| |
|
|
|
|
|
|
|
|
|
| |
These fixup kinds are not explicitly related to the code section. They
are there to signal how to apply the fixup.
Also, a couple of other minor wasm cleanups.
Differential Revision: https://reviews.llvm.org/D59908
llvm-svn: 357145
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The issue here is that we actually allow CGSCC passes to mutate IR (and
therefore invalidate analyses) outside of the current SCC. At a minimum,
we need to support mutating parent and ancestor SCCs to support the
ArgumentPromotion pass which rewrites all calls to a function.
However, the analysis invalidation infrastructure is heavily based
around not needing to invalidate the same IR-unit at multiple levels.
With Loop passes for example, they don't invalidate other Loops. So we
need to customize how we handle CGSCC invalidation. Doing this without
gratuitously re-running analyses is even harder. I've avoided most of
these by using an out-of-band preserved set to accumulate the cross-SCC
invalidation, but it still isn't perfect in the case of re-visiting the
same SCC repeatedly *but* it coming off the worklist. Unclear how
important this use case really is, but I wanted to call it out.
Another wrinkle is that in order for this to successfully propagate to
function analyses, we have to make sure we have a proxy from the SCC to
the Function level. That requires pre-creating the necessary proxy.
The motivating test case now works cleanly and is added for
ArgumentPromotion.
Thanks for the review from Philip and Wei!
Differential Revision: https://reviews.llvm.org/D59869
llvm-svn: 357137
|
| |
|
|
|
|
|
|
|
| |
The last reference to this function was removed from the ARM
td files in 2015 in rL225266.
Differential Revision: https://reviews.llvm.org/D59868
llvm-svn: 357130
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we know the 2 halves of an oversized zext-in-reg are the same,
don't create those halves independently.
I tried several different approaches to fold this, but it's difficult
to get right during legalization. In the default path, we are creating
a generic shuffle that looks like an unpack high, but it can get
transformed into a different mask (a blend), so it's not
straightforward to match that. If we try to fold after it actually
becomes an X86ISD::UNPCKH node, we can't be sure what the operand node
is - it might be a generic shuffle, or it could be some x86-specific op.
From the test output, we should be doing something like this for SSE4.1
as well, but I'd rather leave that as a follow-up since it involves
changing lowering actions.
Differential Revision: https://reviews.llvm.org/D59777
llvm-svn: 357129
|
| |
|
|
|
|
|
|
|
| |
This is not exactly NFC because it should make further combines
of MOVMSK easier to match, but there should be no outward differences
because we have isel patterns in place specifically to allow this. See:
// Also support integer VTs to avoid a int->fp bitcast in the DAG.
llvm-svn: 357128
|
| |
|
|
|
|
|
|
| |
PreprocessISelDAG to runOnMachineFunction. NFCI
This makes more sense as a place to initialize these. I don't think runOnMachineFunction was overriden when these cached values were originally created.
llvm-svn: 357123
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations.
Reviewers: courbet, jyknight
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59897
llvm-svn: 357121
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)
This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.
Patch by Guillaume Marques. Thanks!
Differential Revision: https://reviews.llvm.org/D56201
llvm-svn: 357120
|
| |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D59855
modified: llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
llvm-svn: 357117
|
| |
|
|
|
|
|
| |
This patch appears to trigger very large compile time increases in
halide builds.
llvm-svn: 357116
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split off from D59749. This adds isWrappedSet() and
isUpperSignWrapped() set with the same behavior as isSignWrappedSet()
and isUpperWrapped() for the respectively other domain.
The methods isWrappedSet() and isSignWrappedSet() will not consider
ranges of the form [X, Max] == [X, 0) and [X, SignedMax] == [X, SignedMin)
to be wrapping, while isUpperWrapped() and isUpperSignWrapped() will.
Also replace the checks in getUnsignedMin() and friends with method
calls that implement the same logic.
llvm-svn: 357112
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A recent fix (r355751) caused a compile time regression because setting
the ModifiedDT flag in optimizeSelectInst means that each time a select
instruction is optimized the function walk in runOnFunction stops and
restarts again (which was needed to build a new DT before we started
building it lazily in r356937). Now that the DT is built lazily, a
simple fix is to just reset the DT at this point, rather than restarting
the whole function walk.
In the future other places that set ModifiedDT may want to switch to
just resetting the DT directly. But that will require an evaluation to
ensure that they don't otherwise need to restart the function walk.
Reviewers: spatel
Subscribers: jdoerfert, llvm-commits, xur
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59889
llvm-svn: 357111
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the
memory operands of load and store-multiple operations. This doesn't
really fix the problem properly, but it's enough to prevent crashing,
at least.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 .
Differential Revision: https://reviews.llvm.org/D59834
llvm-svn: 357109
|
| |
|
|
| |
llvm-svn: 357108
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split out from D59749. The current implementation of isWrappedSet()
doesn't do what it says on the tin, and treats ranges like
[X, Max] as wrapping, because they are represented as [X, 0) when
using half-inclusive ranges. This also makes it inconsistent with
the semantics of isSignWrappedSet().
This patch renames isWrappedSet() to isUpperWrapped(), in preparation
for the introduction of a new isWrappedSet() method with corrected
behavior.
llvm-svn: 357107
|
| |
|
|
|
|
|
|
| |
If there were only dbg_values in the block, recede would hit the
beginning of the block and try to use thet dbg_value as a real
instruction.
llvm-svn: 357105
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Start using the uadd.sat and usub.sat intrinsics for the existing
canonicalizations. These intrinsics should optimize better than
expanded IR, have better handling in the X86 backend and should
be no worse than expanded IR in other backends, as far as we know.
rL357012 already introduced use of uadd.sat for the add+umin pattern.
Differential Revision: https://reviews.llvm.org/D58872
llvm-svn: 357103
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
The artifact combiners push instructions which have been marked for deletion
onto an list for the legalizer to deal with on return. However, for trunc(ext)
combines the combiner routine recursively calls itself. When it does this the
dead instructions list may not be empty, and the other combiners don't expect
to be dealing with essentially invalid MIR (multiple vreg defs etc).
This change fixes it by ensuring that the dead instructions are processed on
entry into tryCombineInstruction.
As a result, this fix exposed a few places in tests where G_TRUNC instructions
were not being deleted even though they were dead.
Differential Revision: https://reviews.llvm.org/D59892
llvm-svn: 357101
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.
This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.
llvm-svn: 357098
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.
When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.
This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.
Fixes PR41055
Differential Revision: https://reviews.llvm.org/D59391
llvm-svn: 357096
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes an overly conservative check that would prevent
simplifying copies when the value we were tracking would go through
several subregister indices.
Indeed, the intend of this check was to not track values whenever
we have to compose subregister, but actually what the check was
doing was bailing anytime we see a second subreg, even if that
second subreg would actually be the new source of truth (as opposed
to a part of that subreg).
Differential Revision: https://reviews.llvm.org/D59891
llvm-svn: 357095
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes an assembler bug that allowed SVE vector registers to contain a
type suffix when not expected. The SVE unpredicated movprfx instruction is the
only instruction affected.
The following are examples of what was previously valid:
movprfx z0.b, z0.b
movprfx z0.b, z0.s
movprfx z0, z0.s
These instructions are now erroneous.
Patch by Cullen Rhodes (c-rhodes)
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D59636
llvm-svn: 357094
|
| |
|
|
|
|
|
| |
Another test is needed for the case where the scavenge fail, but
there's another issue with that which needs an additional fix.
llvm-svn: 357093
|
| |
|
|
|
|
|
|
|
|
| |
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.
Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.
llvm-svn: 357091
|
| |
|
|
|
|
|
| |
Based on how these are inserted, I doubt this was causing a problem in
practice.
llvm-svn: 357090
|
| |
|
|
|
|
|
| |
Introduce new helper class to copy properties directly from the base
instruction.
llvm-svn: 357089
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Added methods to check for under-/overflow in additions, subtractions, signed divisions/modulus, negations, and multiplications.
Reviewers: ddcc, gou4shi1
Reviewed By: ddcc, gou4shi1
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59796
llvm-svn: 357088
|
| |
|
|
|
|
|
|
|
|
| |
Currently this is called before the frame size is set on the
function. For AMDGPU, the scavenger is used for large frames where
part of the offset needs to be materialized in a register, so
estimating the frame size is useful for knowing whether the scavenger
is useful.
llvm-svn: 357087
|
| |
|
|
|
|
| |
Waring was introduced at r357074.
llvm-svn: 357085
|
| |
|
|
|
|
|
| |
This shouldn't change anything since the no-ret atomics are selected
later.
llvm-svn: 357084
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The AMDGPU implementation of getReservedRegs depends on
MachineFunctionInfo fields that are parsed from the YAML section. This
was reserving the wrong register since it was setting the reserved
regs before parsing the correct one.
Some tests were relying on the default reserved set for the assumed
default calling convention.
llvm-svn: 357083
|
| |
|
|
|
|
|
| |
This is not a control flow instruction, so should not be marked as
isBarrier. This fixes a verifier error if followed by unreachable.
llvm-svn: 357081
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The .BTF.ext FuncInfoTable and LineInfoTable contain
information organized per ELF section. Current definition
of FuncInfoTable/LineInfoTable is:
std::unordered_map<uint32_t, std::vector<BTFFuncInfo>> FuncInfoTable
std::unordered_map<uint32_t, std::vector<BTFLineInfo>> LineInfoTable
where the key is the section name off in the string table.
The unordered_map may cause the order of section output
different for different platforms.
The same for unordered map definition of
std::unordered_map<std::string, std::unique_ptr<BTFKindDataSec>>
DataSecEntries
where BTF_KIND_DATASEC entries may have different ordering
for different platforms.
This patch fixed the issue by using std::map.
Test static-var-derived-type.ll is modified to generate two
DataSec's which will ensure the ordering is the same for all
supported platforms.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 357077
|
| |
|
|
|
|
|
|
|
|
| |
cycleEnd(). NFCI
There is no reason why stages should be visited in reverse order.
This patch allows the definition of stages that push instructions forward from
their cycleEnd() routine.
llvm-svn: 357074
|
| |
|
|
|
|
| |
The offset operand index is different for atomics.
llvm-svn: 357073
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework BaseIndexOffset and isAlias to fully work with lifetime nodes
and fold in lifetime alias analysis.
This is mostly NFC.
Reviewers: courbet
Reviewed By: courbet
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59794
llvm-svn: 357070
|
| |
|
|
| |
llvm-svn: 357069
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lowering switches with unreachable default"
Original commit by Ayonam Ray.
This commit adds a regression test for the issue discovered in the
previous commit: that the range check for the jump table can only be
omitted if the fall-through destination of the jump table is
unreachable, which isn't necessarily true just because the default of
the switch is unreachable.
This addresses the missing optimization in PR41242.
> During the lowering of a switch that would result in the generation of a
> jump table, a range check is performed before indexing into the jump
> table, for the switch value being outside the jump table range and a
> conditional branch is inserted to jump to the default block. In case the
> default block is unreachable, this conditional jump can be omitted. This
> patch implements omitting this conditional branch for unreachable
> defaults.
>
> Differential Revision: https://reviews.llvm.org/D52002
> Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
llvm-svn: 357067
|