| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Linearing the control flow by placing `try`/`end_try` markers can create
mismatches in unwind destinations. This patch resolves these mismatches
by wrapping those instructions with an incorrect unwind destination with
a nested `try`/`catch`/`end_try` and branching to the right destination
within the new catch block.
Reviewers: dschuff
Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D48345
llvm-svn: 357343
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
While this does not change any final output, this will greatly simplify
ixing unwind destination mismatches in CFGStackify (D48345), because we
have to create some new registers there.
Reviewers: dschuff
Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59652
llvm-svn: 357342
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32
(necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP,
this will result in a FP load from the constant pool followed by a store to
the stack and two integer loads from the stack (necessary as there is no way
to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper
to just materialise integers for the lo and hi parts of the FP constant, so do
that instead.
llvm-svn: 357341
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently we create a routing block to the dispatch block for every
predecessor of every entry. So the total number of routing blocks
created will be (# of preds) * (# of entries). But we don't need to do
this: we need at most 2 routing blocks per loop entry, one for when the
predecessor is inside the loop and one for it is outside the loop. (We
can't merge these into one because this will creates another loop cycle
between blocks inside and blocks outside) This patch fixes this and
creates at most 2 routing blocks per entry.
This also renames variable `Split` to `Routing`, which I think is a bit
clearer.
Reviewers: kripken
Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59462
llvm-svn: 357337
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This feature is not actually used for anything in the WebAssembly
backend, but adding it allows users to get it into the target features
sections of their objects, which makes these objects
future-compatible.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D60013
llvm-svn: 357321
|
| |
|
|
|
|
|
|
|
| |
This adds support for v2s32 vector inserts, and updates the selection +
regbankselect tests for G_INSERT_VECTOR_ELT.
Differential Revision: https://reviews.llvm.org/D59910
llvm-svn: 357318
|
| |
|
|
|
|
|
|
| |
We need XMM registers to handle varargs with the Win64 ABI. Before we would
silently generate bad code resulting in an assertion failure elsewhere in the
backend.
llvm-svn: 357317
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes crashes when a BB in which an END_LOOP is to be placed is
unreachable and does not have any predecessors. Fixes PR41307.
Reviewers: dschuff
Subscribers: yurydelendik, sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60004
llvm-svn: 357303
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.
Also introduce a new amdgpu-ieee attribute to match.
The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.
Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.
llvm-svn: 357302
|
| |
|
|
|
|
| |
from the function attribute. NFCI
llvm-svn: 357297
|
| |
|
|
|
|
|
| |
Refactor the option `max-jump-table-size` to default to the maximum
representable number. Essentially, NFC.
llvm-svn: 357280
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `lowerMSASplatImm` function zero-extends `i32` immediates while
building constant. If target type is `i64`, negative immediate loses
the sign. As a result, for example `__builtin_msa_ldi_d(-1)` lowered
to series of instruction loads incorrect value 0xffffffff to the `$w0`
register instead of single `ldi.d $w0, -1` instruction.
The fix zero-extends unsigned immediates and signed-extend signed
immediates.
Differential Revision: http://reviews.llvm.org/D59884
llvm-svn: 357264
|
| |
|
|
|
|
|
|
|
|
|
|
| |
rules for literals
See bug 40806: https://bugs.llvm.org/show_bug.cgi?id=40806
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D59786
llvm-svn: 357262
|
| |
|
|
|
|
|
|
|
|
| |
See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D59878
llvm-svn: 357249
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D59718
llvm-svn: 357247
|
| |
|
|
|
|
| |
We currently just have test coverage for PMULUDQ - will add more in the future.
llvm-svn: 357244
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode.
double __builtin_setrnd(int mode);
The effective values for mode are:
0 - round to nearest
1 - round to zero
2 - round to +infinity
3 - round to -infinity
Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2).
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D59405
llvm-svn: 357241
|
| |
|
|
|
|
|
|
|
| |
Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to
internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`.
There is nothing actually specific to `ScheduleDAGMI` in `Topo`.
llvm-svn: 357239
|
| |
|
|
|
|
|
|
| |
The register index can only really be an SGPR. Lie that a VGPR index
is legal, and then rewrite the instruction in a waterfall loop to
handle the index.
llvm-svn: 357235
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in place
A shift and add/sub sequence combination is faster in place of a multiply by constant.
Because the cycle or latency of multiply is not huge, we only consider such following
worthy patterns.
```
(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))
```
And the cycles or latency is subtarget-dependent so that we need consider the
subtarget to determine to do or not do such transformation.
Also data type is considered for different cycles or latency to do multiply.
Differential Revision: https://reviews.llvm.org/D58950
llvm-svn: 357233
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It does not currently make sense to use WebAssembly features in some functions
but not others, so this CL adds an IR pass that takes the union of all used
feature sets and applies it to each function in the module. This allows us to
prevent atomics from being lowered away if some function has opted in to using
them. When atomics is not enabled anywhere, we detect whether there exists any
atomic operations or thread local storage that would be stripped and disallow
linking with objects that contain atomics if and only if atomics or tls are
stripped. When atomics is enabled, mark it as used but do not require it of
other objects in the link. These changes allow libraries that do not use atomics
to be built once and linked into both single-threaded and multithreaded
binaries.
Reviewers: aheejin, sbc100, dschuff
Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59625
llvm-svn: 357226
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For multi-dimensional array like below
int a[2][3];
the previous implementation generates BTF_KIND_ARRAY type
like below:
. element_type: int
. index_type: unsigned int
. number of elements: 6
This is not the best way to represent arrays, esp.,
when converting BTF back to headers and users will see
int a[6];
instead.
This patch generates proper support for multi-dimensional arrays.
For "int a[2][3]", the two BTF_KIND_ARRAY types will be
generated:
Type #n:
. element_type: int
. index_type: unsigned int
. number of elements: 3
Type #(n+1):
. element_type: #n
. index_type: unsigned int
. number of elements: 2
The linux kernel already supports such a multi-dimensional
array representation properly.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D59943
llvm-svn: 357215
|
| |
|
|
|
|
|
|
|
|
|
|
| |
C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate
For 64-bit operations we should consider if the immediate can be made to fit
in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate
with MOV32ri instead of movabsq. For AND this allows us to fold the immediate.
Differential Revision: https://reviews.llvm.org/D59867
llvm-svn: 357196
|
| |
|
|
|
|
|
| |
This avoids allocating a few KB of heap memory on startup, and instead
allocates these maps lazily. I noticed this while profiling LLD.
llvm-svn: 357192
|
| |
|
|
|
|
|
|
| |
Select 32 and 64 bit float constants for MIPS32.
Differential Revision: https://reviews.llvm.org/D59933
llvm-svn: 357183
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is probably the least important of our movmsk problems, but I'm starting
at the bottom to reduce distractions.
We were creating a select_cc which bypasses the select and bitmask codegen
optimizations that we have now. If we produce a compare+negate instead, we
allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov.
There's no partial register update danger in these sequences because we always
produce the zero-register xor ahead of the 'set' if needed.
There seems to be a missing fold for sext of a bool bit here:
negl %ecx
movslq %ecx, %rax
...but that's an independent transform.
Differential Revision: https://reviews.llvm.org/D59818
llvm-svn: 357172
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.
See D59688 for context.
Reviewers: andreadb, lebedev.ri
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59872
llvm-svn: 357171
|
| |
|
|
|
|
|
| |
Also improve the check for SALU instructions to also ignore
implicit_def and other fake instructions.
llvm-svn: 357170
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on llvm-exegesis measurements.
Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter,
it is now possible to continue cleanup of the scheduler model.
With this, there are no more latency inconsistencies for the
opcodes that produce stable measurements, and only a few inconsistencies
for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis
measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.)
llvm-svn: 357169
|
| |
|
|
|
|
| |
To avoid more spurious clang-format changes when adding features (D59872).
llvm-svn: 357168
|
| |
|
|
|
|
|
|
| |
Now that D59484 has landed its easier to add these.
Added missing AVX512BW v32i16 equivalents while I was at it.
llvm-svn: 357155
|
| |
|
|
|
|
|
| |
G_STORE for 1-bit values uses a STRBi12, which stores the whole byte.
Zero out the undefined bits before writing.
llvm-svn: 357154
|
| |
|
|
|
|
|
|
|
|
|
| |
G_SELECT uses a 1-bit scalar for the condition, and is currently
implemented with a plain CMPri against 0. This means that values such as
0x1110 are interpreted as true, when instead the higher bits should be
treated as undefined and therefore ignored. Replace the CMPri with a
TSTri against 0x1, which performs an implicit AND, yielding the expected
result.
llvm-svn: 357153
|
| |
|
|
|
|
|
|
|
|
|
| |
These fixup kinds are not explicitly related to the code section. They
are there to signal how to apply the fixup.
Also, a couple of other minor wasm cleanups.
Differential Revision: https://reviews.llvm.org/D59908
llvm-svn: 357145
|
| |
|
|
|
|
|
|
|
| |
The last reference to this function was removed from the ARM
td files in 2015 in rL225266.
Differential Revision: https://reviews.llvm.org/D59868
llvm-svn: 357130
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we know the 2 halves of an oversized zext-in-reg are the same,
don't create those halves independently.
I tried several different approaches to fold this, but it's difficult
to get right during legalization. In the default path, we are creating
a generic shuffle that looks like an unpack high, but it can get
transformed into a different mask (a blend), so it's not
straightforward to match that. If we try to fold after it actually
becomes an X86ISD::UNPCKH node, we can't be sure what the operand node
is - it might be a generic shuffle, or it could be some x86-specific op.
From the test output, we should be doing something like this for SSE4.1
as well, but I'd rather leave that as a follow-up since it involves
changing lowering actions.
Differential Revision: https://reviews.llvm.org/D59777
llvm-svn: 357129
|
| |
|
|
|
|
|
|
|
| |
This is not exactly NFC because it should make further combines
of MOVMSK easier to match, but there should be no outward differences
because we have isel patterns in place specifically to allow this. See:
// Also support integer VTs to avoid a int->fp bitcast in the DAG.
llvm-svn: 357128
|
| |
|
|
|
|
|
|
| |
PreprocessISelDAG to runOnMachineFunction. NFCI
This makes more sense as a place to initialize these. I don't think runOnMachineFunction was overriden when these cached values were originally created.
llvm-svn: 357123
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)
This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.
Patch by Guillaume Marques. Thanks!
Differential Revision: https://reviews.llvm.org/D56201
llvm-svn: 357120
|
| |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D59855
modified: llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
llvm-svn: 357117
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the
memory operands of load and store-multiple operations. This doesn't
really fix the problem properly, but it's enough to prevent crashing,
at least.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 .
Differential Revision: https://reviews.llvm.org/D59834
llvm-svn: 357109
|
| |
|
|
| |
llvm-svn: 357108
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.
This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.
llvm-svn: 357098
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.
When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.
This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.
Fixes PR41055
Differential Revision: https://reviews.llvm.org/D59391
llvm-svn: 357096
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes an assembler bug that allowed SVE vector registers to contain a
type suffix when not expected. The SVE unpredicated movprfx instruction is the
only instruction affected.
The following are examples of what was previously valid:
movprfx z0.b, z0.b
movprfx z0.b, z0.s
movprfx z0, z0.s
These instructions are now erroneous.
Patch by Cullen Rhodes (c-rhodes)
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D59636
llvm-svn: 357094
|
| |
|
|
|
|
|
| |
Another test is needed for the case where the scavenge fail, but
there's another issue with that which needs an additional fix.
llvm-svn: 357093
|
| |
|
|
|
|
|
|
|
|
| |
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.
Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.
llvm-svn: 357091
|
| |
|
|
|
|
|
| |
Based on how these are inserted, I doubt this was causing a problem in
practice.
llvm-svn: 357090
|
| |
|
|
|
|
|
| |
Introduce new helper class to copy properties directly from the base
instruction.
llvm-svn: 357089
|
| |
|
|
|
|
|
| |
This shouldn't change anything since the no-ret atomics are selected
later.
llvm-svn: 357084
|