| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
I got tired of looking at magic constants in tablegen files. This adds
condition codes like ARMCCeq and makes use of them.
I also removed the extra patterns for reverse condition codes from
D70296, they should now be covered by the parent commit.
Differential Revision: https://reviews.llvm.org/D70824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add support for vcadd_* family of intrinsics. This set of intrinsics is
available in Armv8.3-A.
The fp16 versions require the FP16 extension, which has been available
(opt-in) since Armv8.2-A.
Reviewers: t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70862
|
|
|
|
|
|
|
|
|
|
| |
This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics
with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat.
This helps generate vqadds from standard IR nodes, which might be
produced from the vectoriser. The old variants are removed in the
process.
Differential Revision: https://reviews.llvm.org/D69350
|
|
|
|
|
|
|
| |
Fill in the gaps for vrev32.16 f16 patterns, extending the existing i16
patterns.
Differential Revision: https://reviews.llvm.org/D69508
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and
VCMPZ with an extra operand as the condition code. I believe this will make
some combines simpler, allowing us to just look at these codes and not the
operands. It also helps fill in a missing VCGTUZ MVE selection without adding
extra nodes for it.
Differential Revision: https://reviews.llvm.org/D65072
llvm-svn: 366934
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the very basics for MVE vector predication, adding integer VCMP and
VSEL instruction support. This is done through predicate registers (MVT::v16i1,
MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom
lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc).
An extra VCNE was added, as this can be handled sensibly by MVE's expanded
number of VCMP condition codes. (There are also VCLE and VCLT which are added
later).
VPSEL is also added here, simply selecting on the vselect.
Original code by David Sherwood.
Differential Revision: https://reviews.llvm.org/D65051
llvm-svn: 366885
|
|
|
|
|
|
| |
Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE.
llvm-svn: 366790
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds basic lowering for MVE shifts. There are many shifts in MVE, but the
instructions handled here are:
VSHL (imm)
VSHRu (imm)
VSHRs (imm)
VSHL (vector)
VSHL (register)
MVE, like NEON before it, doesn't have shift right by a vector (or register).
We instead have to negate the amount and shift in the opposite direction. This
means we have to convert any SHR's into a form of SHL (that is still signed or
unsigned) with a negated condition and selecting from there. MVE still does
have shifting by an immediate for SHL, ASR and LSR.
This adds lowering for these and for register forms, which work well for shift
lefts but may require an extra fold of neg(vdup(x)) -> vdup(neg(x)) to potentially
work optimally for right shifts.
Differential Revision: https://reviews.llvm.org/D64212
llvm-svn: 366056
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adjusts the way that we lower NEON shifts to use a DAG target node, not
via a neon intrinsic. This is useful for handling MVE shifts operations in the
same the way. It also renames some of the immediate shift nodes for
consistency, and moves some of the processing of immediate shifts into
LowerShift allowing it to capture more cases.
Differential Revision: https://reviews.llvm.org/D64426
llvm-svn: 366051
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds some handling for VMOVimm, using the same method that NEON uses. We
create VMOVIMM/VMVNIMM/VMOVFPIMM nodes based on the immediate, and select them
using the now renamed ARMvmovImm/etc. There is also an extra 64bit immediate
mode that I have not yet added here.
Code by David Sherwood
Differential Revision: https://reviews.llvm.org/D63884
llvm-svn: 365178
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds necessary shuffle vector and buildvector support for ARM MVE.
It essentially adds support for VDUP, VREVs and some VMOVs, which are often
required by other code (like upcoming patches).
This mostly uses the same code from Neon that already generated
NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and
moved to ARMInstrInfo as they are common to both architectures. Most of the
selection code seems to be applicable to both, but NEON does have some more
instructions making some parts specific.
Most code originally by David Sherwood.
Differential Revision: https://reviews.llvm.org/D63567
llvm-svn: 364626
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are starting to add an entirely separate vector architecture to the ARM
backend. To do that we need at least some separation between the existing NEON
and the new MVE code. This patch just goes through the Neon patterns and
ensures that they are predicated on HasNEON, giving MVE a stable place to start
from.
No tests yet as this is largely an NFC, and we don't have the other target that
will treat any of these intructions as legal.
Differential Revision: https://reviews.llvm.org/D62945
llvm-svn: 362870
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.
Unfortunately, insertions cannot be handled in the same way, because:
* There is no instruction to insert FP16 into an even lane (VINS only
works with odd lanes)
* The patterns for odd lanes will have a form of a DAG (not a tree),
and will not be implementable in pure tablegen
Because of this insertions are handled in the same way as 16-bit
integer insertions (with conversions between FP registers and GPRs
using VMOVHR instructions).
Without these patterns the ARM backend would sometimes fail during
instruction selection.
This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
instruction
* an FP16 load and insertion into a single VLD1 instruction
Differential Revision: https://reviews.llvm.org/D62651
llvm-svn: 362482
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now the NEON ones have a prefix "NEON_", and the VFP ones have a
prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be
made to match both of _those_ classes of VMAXNM without also matching
the MVE ones that are going to be introduced soon. NFCI.
Patch by Simon Tatham.
Differential Revision: https://reviews.llvm.org/D60700
llvm-svn: 362097
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MVE extension in Arm v8.1-M permits the use of some move, load and
store isntructions which access the FP registers, even if there's no
actual FP support in the processor (in particular, if you have the
integer-only version of MVE).
Therefore, we need separate subtarget features to condition those
instructions on, which are implied by both FP and MVE but are not part
of either.
Patch mostly by Simon Tatham.
Differential Revision: https://reviews.llvm.org/D60694
llvm-svn: 362088
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds some basic operations for fp16
vectors, such as bitcast from fp16 to i16,
required to perform extract_subvector (also added
here) and extract_element.
Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard
Reviewed By: ostannard
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60618
llvm-svn: 359433
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node.
As well, allows <8xhalf> and <4xhalf> vldup1 operations.
These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics.
Reviewers: olista01, pbarrio, LukeGeeson, efriedma
Reviewed By: efriedma
Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60319
llvm-svn: 358081
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The indexed variant of vfmal.f16 and vfmsl.f16
instructions use the uppser bits of the indexed
operand to store the index (1 bit for the double
variant, 2 bits for the quad).
This limits the usable registers to d0 - d7 or
s0 - s15. This patch enforces this limitation.
Differential Revision: https://reviews.llvm.org/D59021
llvm-svn: 355707
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.
Reviewers: arsenm, aheejin, dschuff, javed.absar
Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53112
llvm-svn: 345218
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2
patches, that is trying to achieve the same for VFMA. The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:
--------------------------------------------------------
| Opt | < rL342874 | >= rL342874 | |
|------------------------------------------------------|
|-O3 | vmla | vmul | vmul |
| | | vadd | vadd |
|------------------------------------------------------|
|-Ofast | vfma | vfma | vmul |
| | | | vadd |
|------------------------------------------------------|
|-Oz | vmla | vmla | vmla |
--------------------------------------------------------
This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2. This also fixes a typo in the regression test added in rL342874.
Differential revision: https://reviews.llvm.org/D53314
llvm-svn: 344671
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add +fp16fml feature for new FP16 instructions, which are a
mandatory part of FP16 from v8.4-A and an optional part of FP16
from v8.2-A. It doesn't seem to be possible to model this in
LLVM, but the relationship between the options is handled by
the related clang patch.
In keeping with what I think is the usual practice, the fp16fml
extension is accepted regardless of base architecture version.
Builds on/replaces Sjoerd Meijer's patch to add these instructions at
https://reviews.llvm.org/D49839.
Differential Revision: https://reviews.llvm.org/D50228
llvm-svn: 340013
|
|
|
|
| |
llvm-svn: 339546
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50427
llvm-svn: 339241
|
|
|
|
|
|
|
|
| |
This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants.
Differential Revision: https://reviews.llvm.org/D50329
llvm-svn: 339238
|
|
|
|
|
|
|
|
| |
This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants.
Differential Revision: https://reviews.llvm.org/D50326
llvm-svn: 339232
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50236
llvm-svn: 339148
|
|
|
|
|
|
| |
This is addressing PR38404.
llvm-svn: 338830
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.
Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.
The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.
This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.
Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.
That's what we generate with this patch applied:
vld2q_dup_f16:
vld2.16 {d0[], d2[]}, [r0]
vld2.16 {d1[], d3[]}, [r0]
vld3q_dup_f16:
vld3.16 {d0[], d2[], d4[]}, [r0]
vld3.16 {d1[], d3[], d5[]}, [r0]
vld4q_dup_f16:
vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
vld4.16 {d1[], d3[], d5[], d7[]}, [r0]
Differential Revision: https://reviews.llvm.org/D48439
llvm-svn: 335733
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D47831
llvm-svn: 334553
|
|
|
|
|
|
|
|
|
| |
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47447
llvm-svn: 334361
|
|
|
|
|
|
|
|
|
| |
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47120
llvm-svn: 333825
|
|
|
|
|
|
|
|
| |
The LLVM part was committed instead of the Clang part.
Differential Revision: https://reviews.llvm.org/D47121
llvm-svn: 333824
|
|
|
|
|
|
|
|
|
| |
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47121
llvm-svn: 333819
|
|
|
|
|
|
|
|
|
| |
This adds IR intrinsics for the ARM dot-product instructions introduced in
v8.2-A.
Differential revision: https://reviews.llvm.org/D46106
llvm-svn: 331032
|
|
|
|
|
|
|
|
|
| |
This adds code generation support for the FP16 vmaxnm/vminnm scalar
instructions.
Differential Revision: https://reviews.llvm.org/D44675
llvm-svn: 330034
|
|
|
|
|
|
|
|
|
|
| |
Follow up patch of r328313 to support the UseVMOVSR constraint. Removed
some unneeded instructions from the test and removed some stray
comments.
Differential Revision: https://reviews.llvm.org/D44941
llvm-svn: 328691
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which
uses v4f16 and v8f16 vector operands and return values. All the moving parts
are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16
intrinsic. In a follow-up patch the rest of the intrinsics and tests will be
added.
Differential Revision: https://reviews.llvm.org/D44538
llvm-svn: 327839
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently the LLVM MC assembler is able to convert e.g.
vmov.i32 d0, #0xabababab
(which is technically invalid) into a valid instruction
vmov.i8 d0, #0xab
this patch adds support for vmov.i64 and for cases with the resulting
load types other than i8, e.g.:
vmov.i32 d0, #0xab00ab00 ->
vmov.i16 d0, #0xab00
Reviewers: olista01, rengolin
Reviewed By: rengolin
Subscribers: rengolin, javed.absar, kristof.beyls, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D44467
llvm-svn: 327709
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New instructions are added to AArch32 and AArch64 to aid
floating-point multiplication and addition of complex numbers, where
the complex numbers are packed in a vector register as a pair of
elements. The Imaginary part of the number is placed in the more
significant element, and the Real part of the number is placed in the
less significant element.
This patch adds assembler for the ARM target.
Differential Revision: https://reviews.llvm.org/D36789
llvm-svn: 314511
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vectors
This teach simplifyDemandedBits to handle constant splat vector shifts.
This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount.
I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here.
I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change.
Differential Revision: https://reviews.llvm.org/D37665
llvm-svn: 314139
|
|
|
|
|
|
|
|
|
|
|
| |
The indexed dot product instructions only accept the lower 16 D-registers as
the indexed register, but we were e.g. incorrectly accepting:
vudot.u8 d16,d16,d18[0]
Differential Revision: https://reviews.llvm.org/D37968
llvm-svn: 313531
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: In some cases, shufflevector instruction can be transformed involving insert_subvector instructions. The ARM backend was missing some insert_subvector patterns, causing a failure during instruction selection. AArch64 has similar patterns.
Reviewers: t.p.northover, olista01, javed.absar, rengolin
Reviewed By: javed.absar
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D36796
llvm-svn: 311543
|
|
|
|
|
|
|
|
|
| |
Commit r310480 added the AArch64 ARMv8.2a dot product instructions;
this adds the AArch32 instructions.
Differential Revision: https://reviews.llvm.org/D36575
llvm-svn: 310701
|
|
|
|
|
|
|
|
|
|
| |
This patch adds missing scheds for Neon VLDx/VSTx instructions.
This will help one write schedulers easier/faster in the future for ARM sub-targets.
Existing models will not affected by this patch.
Reviewed by: Renato Golin, Diana Picus
Differential Revision: https://reviews.llvm.org/D33120
llvm-svn: 303717
|
|
|
|
|
|
|
|
|
|
| |
Update NEON int_arm_neon_vabs intrinsic to use the ISD::ABS opcode directly
Added constant folding tests.
Differential Revision: https://reviews.llvm.org/D32938
llvm-svn: 302417
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D32103
llvm-svn: 300749
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D30794
llvm-svn: 297768
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The instructions VLDM/VSTM can only access word-aligned memory
locations and produce alignment fault if the condition is not met.
The compiler currently generates VLDM/VSTM for v2f64 load/store
regardless the alignment of the memory access. Instead, if a v2f64
load/store is not word-aligned, the compiler should generate
VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV
instruction should be generated when targeting Big Endian.
Differential Revision: https://reviews.llvm.org/D25281
llvm-svn: 283763
|
|
|
|
|
|
|
|
|
|
| |
VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.
rdar://problem/23754176
llvm-svn: 257545
|