| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This batch of intrinsics fills in all the shift instructions that take
a variable shift distance in a register, instead of an immediate. Some
of these instructions take a single shift distance in a scalar
register and apply it to all lanes; others take a vector of per-lane
distances.
These instructions are all basically one family, varying in whether
they saturate out-of-range values, and whether they round when bits
are shifted off the bottom. I've implemented them at the IR level by a
much smaller family of IR intrinsics, which take flag parameters to
indicate saturating and/or rounding (along with the usual one to
specify signed/unsigned integers).
An oddity is that all of them are //left// shift instructions – but if
you pass a negative shift count, they'll shift right. So the vector
shift distances are always vectors of //signed// integers, regardless
of whether you're considering the other input vector to be of signed
or unsigned. Also, even the simplest `vshlq` instruction in this
family (neither saturating nor rounding) has to be implemented as an
IR intrinsic, because the ordinary LLVM IR `shl` operation would
consider an out-of-range shift count to be undefined behavior.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72329
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This batch of intrinsics covers two sets of immediate shift
instructions, which have in common that they only overwrite part of
their output register and so they need an extra input giving its
previous value.
The VSLI and VSRI instructions shift each lane of the input vector
left or right just as if they were normal immediate VSHL/VSHR, but
then they only overwrite the output bits that correspond to actual
shifted bits of the input. So VSLI will leave the low n bits of each
output lane unchanged, and VSRI the same with the top n bits.
The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input
vector of 2n-bit integers, shift each lane right by a constant, and
then narrowing the shifted result to only n bits. So they only
overwrite half of the n-bit lanes in the output register, and the B/T
suffix indicates whether it's the bottom or top half of each 2n-bit
lane.
I've implemented the whole of the latter family using a single IR
intrinsic `vshrn`, which takes a lot of i32 parameters indicating
which instruction it expands to (by specifying signedness of the input
and output types, whether it saturates and/or rounds, etc).
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Running an end-to-end test last week I noticed that a lot of the ACLE
intrinsics that operate differently on vectors of signed and unsigned
integers were ending up generating the signed version of the
instruction unconditionally. This is because the IR intrinsics had no
way to distinguish signed from unsigned: the LLVM type system just
calls them both `v8i16` (or whatever), so you need either separate
intrinsics for signed and unsigned, or a flag parameter that tells
ISel which one to choose.
This patch fixes all the problems of that kind that I've noticed, by
adding an i32 flag parameter to many of the IR intrinsics which is set
to 1 for unsigned (matching the existing practice in cases where we
got it right), and conditioning all the isel patterns on that flag. So
the fundamental change is in `IntrinsicsARM.td`, changing the
low-level IR intrinsics API; there are knock-on changes in
`arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the
modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the
new unsigned flags). The rest of this patch is boringly updating tests.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72270
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Due to a copy-paste error in the isel patterns, the predicated version
of this intrinsic was expanding to the `VMAXNMT.F32` instruction
instead of `VMAXNMT.F16`. Similarly for vminnm.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72269
|
|
|
|
|
|
| |
numbers.
Fix broken tests. I can't yet explain how they worked locally pre-commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds intrinsics for the following MVE instructions:
* VABAV
* VMLADAV, VMLSDAV
* VMLALDAV, VMLSLDAV
* VRMLALDAVH, VRMLSLDAVH
Each of the above 4 groups has a corresponding new LLVM IR intrinsic,
since the instructions cannot be easily represented using
general-purpose IR operations.
Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM
Reviewed By: MarkMurrayARM
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71062
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fills in the remaining shift operations that take a single vector
input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and
`vshll[bt]` families.
`vshll[bt]` (which shifts each input lane left into a double-width
output lane) is the most interesting one. There are separate MC
instruction ids for shifting by exactly the input lane width and
shifting by less than that, because the instruction encoding is so
completely different for the lane-width special case. So I had to
write two sets of patterns to match based on the immediate shift
count, which involved adding a ComplexPattern matcher to avoid the
general-case pattern accidentally matching the special case too. For
that family I've made sure to add an llc codegen test for both
versions of each instruction.
I'm experimenting with a new strategy for parametrising the isel
patterns for all these instructions: adding extra fields to the
relevant `Instruction` subclass itself, which are ignored by the
Tablegen backends that generate the MC data, but can be retrieved from
each instance of that instruction subclass when it's passed as a
template parameter to the multiclass that generates its isel patterns.
A nice effect of that is that I can fill in those informational fields
using `let` blocks, rather than having to type them out once per
instruction at `defm` time.
(As a result, quite a lot of existing instruction `def`s are
reindented by this patch, so it's clearer to read with whitespace
changes ignored.)
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71458
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Better use of multiclass is used, and this helped find some existing
bugs in the predicated VMULL* intrinsics, which are now fixed.
The refactored VMULL[TB]Q_(INT|POLY)_M() intrinsics were discovered
to have an argument ("inactive") with incorrect type, and this required
a fix that is included in this whole patch. The argument "inactive"
should have been the same width (per vector element) as the return
type of the intrinsic, but was not in the case where the return type
was double the element width of the input types.
To assist in testing the multiclassing , and to thwart further gremlins,
the unit tests are improved in scope.
The *.ll tests are all generated by a small bit of throw-away scripting
from the corresponding *.c tests, and as such the diffs are large and
nasty. Look at the file rather than the diff.
Reviewers: dmgreen, miyuki, ostannard, simon_tatham
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71421
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.
There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.
In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: dmgreen, MarkMurrayARM
Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71065
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch refactors instruction selection of the complex vector
addition, multiplication and multiply-add intrinsics, so that it is
now based on TableGen patterns rather than C++ code.
It also changes the first parameter (halving vs non-halving) of the
arm_mve_vcaddq IR intrinsic to match the corresponding instruction
encoding, hence it requires some changes in the tests.
The patch addresses David's comment in https://reviews.llvm.org/D71190
Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71245
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and two follow-on commits: one warning fix and one functionality.
As it's breaking at least the lto bot:
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio
This reverts commits:
8d70f3c933a5b81a87a5ab1af0e3e98ee2cd7c67
ff4dceef9201c5ae3924e92f6955977f243ac71d
d97b3e3e65cd77a81b39732af84a1a4229e95091
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VQDMULHQ, VQRDMULHQ intrinsics.
Summary: Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics and unit tests.
Reviewers: simon_tatham, ostannard, dmgreen, miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71198
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add VMULL[BT]Q_(INT|POLY) intrinsics and unit tests.
Reviewers: simon_tatham, ostannard, dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71066
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.
There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.
In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71065
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds intrinsics for the following MVE instructions:
* VCADD, VHCADD
* VCMUL
* VCMLA
Each of the above 3 groups has a corresponding new LLVM IR intrinsic.
Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen
Reviewed By: MarkMurrayARM
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71190
|
|
|
|
|
|
|
| |
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.
Differential Revision: https://reviews.llvm.org/D70968
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add MVE VMULH/VRMULH intrinsics and unit tests.
Reviewers: simon_tatham, ostannard, dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70948
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit adds the `vpselq` intrinsics which take an MVE predicate
word and select lanes from two vectors; the `vctp` intrinsics which
create a tail predicate word suitable for processing the first m
elements of a vector (e.g. in the last iteration of a loop); and
`vpnot`, which simply complements a predicate word and is just
syntactic sugar for the `~` operator.
The `vctp` ACLE intrinsics are lowered to the IR intrinsics we've
already added (and which D70592 just reorganized). I've filled in the
missing isel rule for VCTP64, and added another set of rules to
generate the predicated forms.
I needed one small tweak in MveEmitter to allow the `unpromoted` type
modifier to apply to predicates as well as integers, so that `vpnot`
doesn't pointlessly convert its input integer to an `<n x i1>` before
complementing it.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70485
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics and their predicated versions. Add unit tests.
Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70829
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tests.
Summary: Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit tests.
Reviewers: simon_tatham, ostannard, dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70547
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VMUL* instructions. Add unit tests.
Summary: Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests.
Reviewers: simon_tatham, ostannard, dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70546
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add MVE VABD intrinsics. Add unit tests.
Reviewers: simon_tatham, ostannard, dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fills in the small family of MVE intrinsics that have nothing to
do with vectors: they implement bit-shift operations on 32- or 64-bit
values held in one or two general-purpose registers. Most of these
shift operations saturate if shifting left, and round to nearest if
shifting right, although LSLL and ASRL behave like ordinary shifts.
When these instructions take a variable shift count in a register,
they pay attention to its sign, so that (for example) LSLL or UQRSHLL
will shift left if given a positive number but right if given a
negative one. That makes even LSLL and ASRL different enough from
standard LLVM IR shift semantics that I couldn't see any better
alternative than to simply model the whole family as a set of
MVE-specific IR intrinsics.
(The //immediate// forms of LSLL and ASRL, on the other hand, do
behave exactly like a standard IR shift of a 64-bit value. In fact,
those forms don't have ACLE intrinsics defined at all, because you can
just write an ordinary C shift operation if you want one of those.)
The 64-bit shifts have to be instruction-selected in C++, because they
deliver two output values. But the 32-bit ones are simple enough that
I could write a DAG isel pattern directly into each Instruction
record.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70319
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the ACLE intrinsics for all the MVE load and store
instructions not already handled by D69791. These ones don't need new
IR intrinsics, because they can be implemented in terms of standard
LLVM IR constructions.
Some of the load and store instructions access less than 128 bits of
memory, sign/zero extending each value to a wider vector lane on load
or truncating it on store. These are represented in IR by a load of a
shorter vector followed by a zext/sext, and conversely, a trunc
followed by a short store. Existing ISel patterns already recognize
those combinations and turn them into the right MVE instructions.
The predicated forms of all these instructions are represented in the
same way, except that the ordinary load/store operation is replaced
with the existing intrinsics @llvm.masked.{load,store}. These are
currently only code-generated as predicated MVE load/store
instructions if you give LLVM the `-enable-arm-maskedldst` option; so
I've done that in the LLVM codegen test. When we make that the
default, that option can be removed.
In the Tablegen backend, I've had to add a handful of extra support
features:
* We need to be able to make clang::Address objects out of a
pointer and an alignment (previously we only needed these when the
user passed us an existing one).
* We can now specify vector types that aren't 128 bits wide (for use
in those intermediate values in IR), the parametrized type system
can make one starting from two existing vector types (using the lane
count of one and the element type of the other).
* I've added support for code generation of pointer casts, and for
specifying LLVM types as operands to IRBuilder operations (for zext
and sext, though I think they'll come in useful again).
* Now not all IR construction operations need to be specified as
Builder.CreateFoo; some don't involve a Builder at all, and one
passes it as a parameter to a tiny static helper function in
CGBuiltin.cpp.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70088
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MVE VADC instruction reads and writes the carry bit at bit 29 of
the FPSCR register. The corresponding ACLE intrinsic is specified to
work with an integer in which the carry bit is stored at bit 0. So if
a user writes a code sequence in C that passes the carry from one VADC
to the next, like this,
s0 = vadcq_u32(a0, b0, &carry);
s1 = vadcq_u32(a1, b1, &carry);
then clang will generate IR for each of those operations that shifts
the carry bit up into bit 29 before the VADC, and after it, shifts it
back down and masks off all but the low bit. But in this situation
what you really wanted was two consecutive VADC instructions, so that
the second one directly reads the value left in FPSCR by the first,
without wasting several instructions on pointlessly clearing the other
flag bits in between.
This commit explains to InstCombine that the other bits of the flags
operand don't matter, and adds a test that demonstrates that all the
code between the two VADC instructions can be optimized away as a
result.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67162
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VST2 and VST4 instructions take two or four vector registers as
input, and store part of each register to memory in an interleaved
pattern. They come in variants indicating which part of each register
they store (VST20 and VST21; VST40 to VST43 inclusive); the intention
is that issuing each of those variants in turn has the combined effect
of loading or storing the whole set of registers to a memory block of
equal size. The corresponding VLD2 and VLD4 instructions load from
memory in the same interleaved format: each one overwrites only part
of its output register set, and again, the idea is that if you use
VLD4{0,1,2,3} or VLD2{0,1} together, you end up having written to the
whole of each register.
I've implemented the stores and loads quite differently. The loads
were easiest to implement as a single intrinsic that expands to all
four VLD4x instructions or both VLD2x, delivering four complete output
registers. (Implementing each individual load as a separate
instruction taking four input registers to partially overwrite is
possible in theory, but pointless, and when I tried it, I found it
would need extra work to get the register allocation not to be
horrible.) Since that intrinsic delivers multiple outputs, it has to
be instruction-selected in custom C++.
But the store instructions are easier to model individually, because
they don't overwrite any register at all and you can write a DAG Isel
pattern in Tablegen for each one.
Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load
instructions, delivers four full output vectors, and is handled by C++
code, whereas `int_arm_mve_vst4q` expands to just one store
instruction, takes four input vectors and a constant indicating which
lanes to store, and is handled entirely in Tablegen. (And similarly
for vld2q/vst2q.) This is asymmetric, but it was the easiest way to do
each one.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68700
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds some initial example IR intrinsics for MVE instructions that
deliver multiple output values, and hence, have to be instruction-
selected by custom C++ code instead of Tablegen patterns.
I've added the writeback gather load instructions (taking a vector of
base addresses and a single common offset, returning a vector of
loaded values and an updated vector of base addresses); one example
from the long shift family (taking and returning a 64-bit value in two
GPRs); and the VADC instruction (which propagates a carry bit from
each vector-lane addition to the next, taking an input carry flag in
FPSCR and outputting the final one in FPSCR as well).
To support the VPT-predicated forms of these instructions, I've
written some helper functions to add the cluster of MVE predicate
operands to the end of a MachineInstr. `AddMVEPredicateToOps` is used
when the instruction actually is predicated (so it takes a predicate
mask argument), and `AddEmptyMVEPredicateToOps` is for when the
instruction is unpredicated (so it fills in $noreg for the mask). Each
one comes in a form suitable for `vpred_n`, and one for `vpred_r`
which takes the extra 'inactive' parameter.
For VADC, the representation of the carry flag in the IR intrinsic is
a word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e.
with the carry flag in bit 29 of the word. (The user-facing ACLE
intrinsic will want it to be in bit 0, but I'll do that on the clang
side.)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68699
|
|
This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.
This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).
When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.
(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new `MVEVectorVTInfo` records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)
The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67158
|