| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In our CHERI fork we use BUNDLE instructions to ensure that a
three-instruction sequence to generate a program-counter-relative value is
emitted without reordering or insertions (since that would break the 32-bit
offset computation). This sequence is created in MipsExpandPseudo and we use
finalizeBundle() to create the BUNDLE instruction.
However, the delay slot filler currently breaks this pattern since the BUNDLE
will be removed and so all instructions are moved into the delay slot.
Since the delay slot only executes the first instruction, this results in
incorrect computations (and run-time crashes) if the branch is taken.
The original test cases uses CHERI instructions, so for the test case here
I simple filled a BUNDLE with a no-op DADDiu $sp_64, -16 and DADDiu $sp_64, 16.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70944
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I was tracking down a code-generation bug in this pass and found that the
added output was useful. It is also helpful to find out why a delay slot
could not be filled even though there is clearly a valid instruction (which
appears to mostly be caused by CFI instructions).
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70940
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, getIntImmCost returns TCC_Free for almost all intrinsics.
For most AArch64 specific intrinsics however, it looks like integer
constants cannot be folded into most of them (at least the ones I checked).
Unless we know that we can fold integer operands with the intrinsic, we
handle more cases correctly by returning the cost to materialize the
immediate than return TCC_Free.
Reviewers: SjoerdMeijer, dmgreen, t.p.northover, ributzka
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D70669
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2
constant operands into the same SALU operation, with neither operand able to be
encoded as an inline constant.
Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70896
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example:
x3 = rlwinm x3, 27, 5, 31
x3 = rlwinm x3, 19, 0, 12
can be combined to
x3 = rlwinm x3, 14, 0, 12
Reviewed by: steven.zhang, lkail
Differential Revision: https://reviews.llvm.org/D70374
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is a follow-up of D70881. It models DAZ and FTZ for releated instructions.
Reviewers: craig.topper, RKSimon, andrew.w.kaylor
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70938
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Model MXCSR for all AVX512 instructions
Reviewers: craig.topper, RKSimon, andrew.w.kaylor
Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70881
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Forcing Local Exec TLS requires the use of copy relocations. Copy
relocations need special handling in the runtime linker when being used
against TLS symbols, which is present in glibc, but not in FreeBSD nor
musl, and so cannot be relied upon. Moreover, copy relocations are a
hack that embed the size of an object in the ABI when it otherwise
wouldn't be, and break protected symbols (which are expected to be DSO
local), whilst also wasting space, thus they should be avoided whenever
possible. As discussed in D70398, RISC-V should move away from forcing
Local Exec, and instead use Initial Exec like other targets, with
possible linker relaxation to follow. The RISC-V GCC maintainers also
intend to adopt this more-conventional behaviour (see
https://github.com/riscv/riscv-elf-psabi-doc/issues/122).
Reviewers: asb, MaskRay
Reviewed By: MaskRay
Subscribers: emaste, krytarowski, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, llvm-commits, bsdjhb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70649
|
|
|
|
|
|
|
|
|
|
| |
As it can be seen from accompanying cleanup, it is not unheard of
to write `~Known.Zero` meaning "what maximal value can this KnownBits
produce". But i think `~Known.Zero` isn't *that* self-explanatory,
as compared to a method with a name.
Note that not all `~Known.Zero` places were cleaned up,
only those where this arguably improves things.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The ISel pattern for SIMD MLA is a bit too eager: it replaces the ADD with an
MLA even when the MUL cannot be eliminated, e.g. when it has another use. An
MLA is usually has a higher latency than an ADD (and there are fewer pipes
available that can execute it), so trading an MLA for an ADD is not great.
ISel is not taking the number of uses of the MUL result into account, nor any
other factors such as the length of the critical path or other resource pressure.
The MachineCombiner is able to make these judgments so this patch ports the ISel
pattern for MUL/ADD fusing to the MachineCombiner.
Similarly for MUL/SUB -> MLS, as well as the indexed variants.
The change has no impact on SPEC CPU© intrate nor fprate.
Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70673
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index:
* @llvm.aarch64.sve.ld1.gather.imm
This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words):
* ld1h { z0.d }, p0/z, [z0.d, #16]
Committed on behalf of Andrzej Warzynski (andwar)
Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma
Reviewed By: sdesmalen
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70806
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are:
* unscaled
* @llvm.aarch64.sve.ld1.gather.sxtw
* @llvm.aarch64.sve.ld1.gather.uxtw
* scaled (offsets become indices)
* @llvm.arch64.sve.ld1.gather.sxtw.index
* @llvm.arch64.sve.ld1.gather.uxtw.index
The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits.
These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words):
* unscaled
* ld1h { z0.s }, p0/z, [x0, z0.s, sxtw]
* ld1h { z0.s }, p0/z, [x0, z0.s, uxtw]
* scaled
* ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1]
* ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1]
Committed on behalf of Andrzej Warzynski (andwar)
Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma
Reviewed By: sdesmalen
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70782
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the following intrinsics:
- faddp
- fmaxp, fminp, fmaxnmp & fminnmp
- fmlalb, fmlalt, fmlslb & fmlslt
- flogb
Reviewers: huntergr, sdesmalen, dancgr, efriedma
Reviewed By: sdesmalen
Subscribers: efriedma, tschuett, kristof.beyls, hiraditya, cameron.mcinally, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70253
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the following intrinsics for gather loads with 64-bit offsets:
* @llvm.aarch64.sve.ld1.gather (unscaled offset)
* @llvm.aarch64.sve.ld1.gather.index (scaled offset)
These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words):
* ld1h { z0.d }, p0/z, [x0, z0.d]
* ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1]
Committing on behalf of Andrzej Warzynski (andwar)
Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70542
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the following intrinsics:
- asr & asrd
- insr
- lsl & lsr
This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic.
Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70437
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Convert ARMCodeGenPrepare into a generic type promotion pass by:
- Removing the insertion of arm specific intrinsics to handle narrow
types as we weren't using this.
- Removing ARMSubtarget references.
- Now query a generic TLI object to know which types should be
promoted and what they should be promoted to.
- Move all codegen tests into Transforms folder and testing using opt
and not llc, which is how they should have been written in the
first place...
The pass searches up from icmp operands in an attempt to safely
promote types so we can avoid generating unnecessary unsigned extends
during DAG ISel.
Differential Revision: https://reviews.llvm.org/D69556
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
processor definition more clear
The old processor design assume that, all the old processor's feature must be
inherited into future processor. That is not true as instruction fusion or some
implementation defined features are not inheritable.
What this patch did:
* Rename the old "specific features" to "additional features" that keep the new added inheritable features.
* Use the "specific features" to keep those features only for specific processor.
* Add the "inheritable features" to keep all the features that inherited from early processor.
Differential Revision: https://reviews.llvm.org/D70768
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Model MXCSR for AVX instructions other than AVX512
Reviewers: craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70875
|
|
|
|
|
|
|
|
|
|
|
| |
analyzePhysReg does not really fit into the iterator and moving it
makes it easier to change the base iterator.
Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D70559
|
|
|
|
|
|
|
| |
Similar to the parent, this adds some constants to tablegen to replace
the existing magic values.
Differential Revision: https://reviews.llvm.org/D70825
|
|
|
|
|
|
|
|
|
|
| |
I got tired of looking at magic constants in tablegen files. This adds
condition codes like ARMCCeq and makes use of them.
I also removed the extra patterns for reverse condition codes from
D70296, they should now be covered by the parent commit.
Differential Revision: https://reviews.llvm.org/D70824
|
|
|
|
|
|
|
|
|
|
| |
The VCMP instructions in MVE can accept a register or ZR, but only as
the right hand operator. Most of the time this will already be correct
because the icmp will have been canonicalised that way already. There
are some cases in the lowering of float conditions that this will not
apply to though. This code should fix up those cases.
Differential Revision: https://reviews.llvm.org/D70822
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit adds the `vpselq` intrinsics which take an MVE predicate
word and select lanes from two vectors; the `vctp` intrinsics which
create a tail predicate word suitable for processing the first m
elements of a vector (e.g. in the last iteration of a loop); and
`vpnot`, which simply complements a predicate word and is just
syntactic sugar for the `~` operator.
The `vctp` ACLE intrinsics are lowered to the IR intrinsics we've
already added (and which D70592 just reorganized). I've filled in the
missing isel rule for VCTP64, and added another set of rules to
generate the predicated forms.
I needed one small tweak in MveEmitter to allow the `unpromoted` type
modifier to apply to predicates as well as integers, so that `vpnot`
doesn't pointlessly convert its input integer to an `<n x i1>` before
complementing it.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70485
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction,
to use in tail predication. But the 64-bit one doesn't work properly:
its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE
type (due to not having a full set of instructions that manipulate it
usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through
`opt` fine, as the test expects, but if you then feed it to `llc` it
causes a type legality failure at isel time.
The usual workaround we've been using in the rest of the MVE
intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the
`vctp64` IR intrinsic to do that, and completely removed the code (and
test) that uses that intrinsic for 64-bit tail predication. That will
allow me to add isel rules (upcoming in D70485) that actually generate
the VCTP64 instruction.
Also renamed all four of these IR intrinsics so that they have `mve`
in the name, since its absence was confusing.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: MarkMurrayARM
Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70592
|
|
|
|
|
|
|
|
|
|
|
|
| |
When converting reg+reg shifts to reg+imm rotates, we neglect to consider the
CodeGenOnly versions of the 32-bit shift mnemonics. This means we produce a
rotate with missing operands which causes a crash.
Committing this fix without review since it is non-controversial that the list
of mnemonics to consider should include the 64-bit aliases for the exact
mnemonics.
Fixes PR44183.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add support for vcadd_* family of intrinsics. This set of intrinsics is
available in Armv8.3-A.
The fp16 versions require the FP16 extension, which has been available
(opt-in) since Armv8.2-A.
Reviewers: t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70862
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D70783
Change-Id: Ic26f915a4acb4c00ecefa9d09d7c24cec370ed06
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics and their predicated versions. Add unit tests.
Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70829
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These instructions do not work quite like I expected them to. They
perform the addition and then shift in a higher precision integer, so do
not match up with the patterns that we added.
For example with s8s, adding 100 and 100 should wrap leaving the shift
to work on a negative number. VHADD will instead do the arithmetic in
higher precision, giving 100 overall. The vhadd gives a "better" result,
but not one that matches up with the input.
I am just removing the patterns here. We might be able to re-add them in
the future by checking for wrap flags or changing bitwidths. But for the
moment just remove them to remove the problem cases.
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D70871
|
|
|
|
| |
comi/ucomi/cvtss2si/cvtsd2si/cvttss2si/cvttsd2si/cvtsi2ss/cvtsi2sd instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
adjustment when optimizing for size"
This caused asserts (and perhaps also miscompiles) while building for Windows
on AArch64. See the discussion on D68530 for details and reproducer.
Reverting until this can be investigated and fixed.
> For arm64, D18619 introduced the ability to combine bumping the stack pointer
> upfront in case it needs to be bumped for both the callee-save area as well as
> the local stack area.
>
> That diff already remarks that "This change can cause an increase in
> instructions", but argues that even when that happens, it should be still be a
> performance benefit because the number of micro-ops is reduced.
>
> We have observed that this code-size increase can be significant in practice.
> This diff disables combining stack bumping for methods that are marked as
> optimize-for-size.
>
> Example of a prologue with the behavior before this diff (combining stack bumping when possible):
> sub sp, sp, #0x40
> stp d9, d8, [sp, #0x10]
> stp x20, x19, [sp, #0x20]
> stp x29, x30, [sp, #0x30]
> add x29, sp, #0x30
> [... compute x8 somehow ...]
> stp x0, x8, [sp]
>
> And after this diff, if the method is marked as optimize-for-size:
> stp d9, d8, [sp, #-0x30]!
> stp x20, x19, [sp, #0x10]
> stp x29, x30, [sp, #0x20]
> add x29, sp, #0x20
> [... compute x8 somehow ...]
> stp x0, x8, [sp, #-0x10]!
>
> Note that without combining the stack bump there are two auto-decrements,
> nicely folded into the stp instructions, whereas otherwise there is a single
> sub sp, ... instruction, but not folded.
>
> Patch by Nikolai Tillmann!
>
> Differential Revision: https://reviews.llvm.org/D68530
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds LowerFormalArguments_AIX, support is added for lowering
int, float, and double formal arguments into general purpose and
floating point registers only.
The aix calling convention testcase have been redone to test for caller
and callee functionality in the same lit test.
Patch by Zarko Todorovski!
Differential Revision: https://reviews.llvm.org/D69578
|
|
|
|
| |
This reverts commit 2d739f98d8a53e38bf9faa88cdb6b0c2a363fb77.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In the cases where the CMOV (f16) SDNode is used with condition codes
LT, LE, VC or NE, it is successfully selected into a VSEL instruction.
In the remaining cases, however, instruction selection fails since VSEL
does not support other condition codes.
This patch handles such cases by using the single-precision version of
the VMOV instruction.
Reviewers: ostannard, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70667
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Pre gfx9 we need to scavenge a 64-bit SGPR to use as the carry out for an Add.
If only one SGPR was available this crashed when trying to scavenge another
32bit SGPR to materialize the offset.
Instead, reuse a 32-bit SGPR from the carry out as the offset register.
Also prefer to use vcc for the unused carry out when it is available.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70614
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In rG643ac6c0420b, the syntax `ldraa x1, [x0]!` was added as an alias
for `ldraa x1, [x0, #0]!`. That syntax is less obvious in meaning, and
also will not be accepted by assemblers that haven't been updated yet.
So it would be better not to emit it as the preferred disassembly for
that instruction.
This change lowers the EmitPriority of the new alias so that the more
explicit syntax `[x0, #0]!` is preferred by the disassembler. The new
syntax is still accepted by the assembler.
Reviewers: ab, ostannard
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70813
|
| |
|
|
|
|
| |
instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Emit the correct .toc psuedo op when we change to the TOC and emit
TC entries. Make sure TOC psuedos get the right symbols via overriding
getMCSymbolForTOCPseudoMO on AIX. Add a test for TOC assembly writing
and update tests to include TOC entries.
Also make sure external globals have a csect set and handle external function descriptor (originally authored by Jason Liu) so we can emit TOC entries for them.
Reviewers: DiggerLin, sfertile, Xiangling_L, jasonliu, hubert.reinterpretcast
Reviewed By: jasonliu
Subscribers: arphaman, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70461
|
|
|
|
|
|
|
|
| |
The Power 9 CPU has some features that are unlikely to be passed on to future
versions of the CPU. This patch separates this out so that future CPU does not
inherit them.
Differential Revision: https://reviews.llvm.org/D70466
|
|
|
|
|
|
|
|
|
|
| |
This is a continuation of D70262
The previous patch as listed above added the future CPU in clang. This patch
adds the future CPU in the PowerPC backend. At this point the patch simply
assumes that a future CPU will have the same characteristics as pwr9. Those
characteristics may change with later patches.
Differential Revision: https://reviews.llvm.org/D70333
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm not sure what the effect of this change will be on all of the affected
tests or a larger benchmark, but it fixes the horizontal add/sub problems
noted here:
https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc
The costs are based on reciprocal throughput numbers in Agner's tables for
PEXTR*; these appear to be very slow ops on Silvermont.
This is a small step towards the larger motivation discussed in PR43605:
https://bugs.llvm.org/show_bug.cgi?id=43605
Also, it seems likely that insert/extract is the source of perf regressions on
other CPUs (up to 30%) that were cited as part of the reason to revert D59710,
so maybe we'll extend the table-based approach to other subtargets.
Differential Revision: https://reviews.llvm.org/D70607
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tests.
Summary: Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit tests.
Reviewers: simon_tatham, ostannard, dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70547
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VMUL* instructions. Add unit tests.
Summary: Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests.
Reviewers: simon_tatham, ostannard, dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70546
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add MVE VABD intrinsics. Add unit tests.
Reviewers: simon_tatham, ostannard, dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70545
|
|
|
|
|
|
|
|
|
|
| |
This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics
with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat.
This helps generate vqadds from standard IR nodes, which might be
produced from the vectoriser. The old variants are removed in the
process.
Differential Revision: https://reviews.llvm.org/D69350
|
|
|
|
| |
Very simple change, just adding the extra syntax variant.
|
|
|
|
|
|
|
| |
This fixes PR44129, which was broken in a7adc3185b (in 7.0.0
and newer).
Differential Revision: https://reviews.llvm.org/D70741
|