summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMInstrMVE.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] VQADD instructionsDavid Green2019-10-101-20/+34
| | | | | | | | | This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336
* [ARM] Ensure we do not attempt to create lsll #0David Green2019-09-251-3/+3
| | | | | | | | | | | During legalisation we can end up with some pretty strange nodes, like shifts of 0. We need to make sure we don't try to make long shifts of these, ending up with invalid assembly instructions. A long shift with a zero immediate actually encodes a shift by 32. Differential Revision: https://reviews.llvm.org/D67664 llvm-svn: 372839
* [ARM][MVE] Remove old tail predicatesSam Parker2019-09-231-0/+1
| | | | | | | | | | | Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also look into the exit block to see if there's any duplicates of the predicates that we've replaced and clone the vctp to be used there instead. Differential Revision: https://reviews.llvm.org/D67709 llvm-svn: 372567
* [ARM] Add a SelectTAddrModeImm7 for MVE narrow loads and storesDavid Green2019-09-171-10/+11
| | | | | | | | | | | | We were previously using the SelectT2AddrModeImm7 for both normal and narrowing MVE loads/stores. As the narrowing instructions do not accept sp as a register, it makes little sense to optimise a FrameIndex into the load, only to have to recover that later on. This adds a SelectTAddrModeImm7 which does not do that folding, and uses it for narrowing load/store patterns. Differential Revision: https://reviews.llvm.org/D67489 llvm-svn: 372134
* [ARM][MVE] Add invalidForTailPredication to TSFlagsSam Parker2019-09-171-0/+7
| | | | | | | | | Set this bit for the MVE reduction instructions to prevent a loop from becoming tail predicated in their presence. Differential Revision: https://reviews.llvm.org/D67444 llvm-svn: 372076
* [ARM] Add patterns for BSWAP intrinsic on MVEOliver Cruickshank2019-09-161-0/+7
| | | | | | | BSWAP can use the VREV instruction on MVE to produce better results than expanding. llvm-svn: 372002
* [ARM] Add patterns for bitreverse intrinsic on MVEOliver Cruickshank2019-09-161-0/+11
| | | | | | | BITREVERSE can use the VBRSR which will reverse and right shift. Shifting right by 0 will just reverse the bits. llvm-svn: 372001
* [ARM] Add patterns for CTLZ on MVEOliver Cruickshank2019-09-161-0/+9
| | | | | | | CTLZ intrinsic can use the VCLS instruction on MVE, which produces better results than expanding. llvm-svn: 371999
* [ARM] Fold VCMP into VPTDavid Green2019-09-161-11/+11
| | | | | | | | | | | | | | | MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in a single instruction, performing the compare and starting the VPT block in one. This teaches the MVEVPTBlockPass to fold them, searching back through the basicblock for a valid VCMP and creating the VPT from its operands. There are some changes to the VPT instructions to accommodate this, altering the order of the operands to match the VCMP better, and changing P0 register defs to be VPR defs, as is used in other places. Differential Revision: https://reviews.llvm.org/D66577 llvm-svn: 371982
* [ARM] Masked loads and storesDavid Green2019-09-151-0/+83
| | | | | | | | | | | | | | | | Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932
* [ARM] Add earlyclobber for cross beat MVE instructionsDavid Green2019-09-131-40/+39
| | | | | | | | | | | | | | | | | | | | | | | | | rL367544 added @earlyclobbers for the MVE VREV64 instruction. This adds the same for a number of other 32bit instructions that are similarly unpredictable if the destination equals the source (due to the cross beat nature of the instructions). This includes: VCADD.f32 VCADD.i32 VCMUL.f32 VHCADD.s32 VMULLT/B.s/u32 VQDMLADH{X}.s32 VQRDMLADH{X}.s32 VQDMLSDH{X}.s32 VQRDMLSDH{X}.s32 VQDMULLT/B.s32 with Qm and Rm No tests here as this would require intrinsics (or very interesting codegen) to manifest. The tests will follow naturally as the intrinsics are added. Differential Revision: https://reviews.llvm.org/D67462 llvm-svn: 371838
* [ARM] Add support for MVE vmaxv and vminvSam Tebbs2019-09-131-0/+29
| | | | | | | | This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv. Differential Revision: https://reviews.llvm.org/D66413 llvm-svn: 371827
* [ARM] Fix loads and stores for predicate vectorsDavid Green2019-09-091-18/+0
| | | | | | | | | | | | | | | | | | | | | | | | These predicate vectors can usually be loaded and stored with a single instruction, a VSTR_P0. However this instruction will store the entire P0 predicate, 16 bits, zeroextended to 32bits. Each lane of the the v4i1/v8i1/v16i1 representing 4/2/1 bits. As far as I understand, when llvm says "store this v4i1", it really does need to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a store followed by a load, which is how the code is expanded. So this instead lowers the v4i1/v8i1 load/store through some shuffles to get the bits into the correct positions. This, as you might imagine, is not as efficient as a single instruction. But I believe it is needed for correctness. v16i1 equally should not load/store 32bits, only storing the 16bits of data. Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not changing). This is fine as they are self-consistent, it is only "externally observable loads/stores" (from our point of view) that need to be corrected. Differential revision: https://reviews.llvm.org/D67085 llvm-svn: 371419
* [ARM] Remove some spurious MVE reduction instructions.Simon Tatham2019-09-091-79/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The family of 'dual-accumulating' vector multiply-add instructions (VMLADAV, VMLALDAV and VRMLALDAVH) can all operate on both signed and unsigned integer types, and they all have an 'exchange' variant (with an X in the name) that modifies which pairs of vector lanes in the two inputs are multiplied together. But there's a clause in the spec that says that the X variants //don't// operate on unsigned integer types, only signed. You can have X, or unsigned, or neither, but not both. We didn't notice that clause when we implemented the MC support for these instructions, so LLVM believes that things like VMLADAVX.U8 do exist, contradicting the spec. Here I fix that by conditioning them out in Tablegen. In order to do that, I've reversed the nesting order of the Tablegen multiclasses for those instructions. Previously, the innermost multiclass generated the X and not-X variants, and the one outside that generated the A and not-A variants. Now X is done by the outer multiclass, which allows me to bypass that one when I only want the two not-X variants. Changing the multiclass nesting order also changes the names of the instruction ids unless I make a special effort not to. I decided that while I was changing them anyway I'd make them look nicer; so now the instructions have names like MVE_VMLADAVs32 or MVE_VMLADAVaxs32, instead of cumbersome _noacc_noexch suffixes. The corresponding multiply-subtract instructions are unaffected. Those don't accept unsigned types at all, either in the spec or in LLVM. Reviewers: ostannard, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67214 llvm-svn: 371405
* [ARM][MVE] VCTP instruction selectionSam Parker2019-09-091-0/+9
| | | | | | | | Add codegen support for vctp{8,16,32}. Differential Revision: https://reviews.llvm.org/D67344 llvm-svn: 371395
* [ARM][MVE] Decoding of uqrshl and sqrshl accepts unpredictable encodingsOliver Stannard2019-09-091-0/+2
| | | | | | | | | | Specify the Unpredictable bits, and return softfails when appropriate. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66939 llvm-svn: 371374
* [ARM] Add patterns for VSUB with q and r registersOliver Cruickshank2019-09-061-0/+9
| | | | | | | Added patterns for VSUB to support q and r registers, which reduces pressure on q registers. llvm-svn: 371231
* [ARM] Add patterns for VADD with q and r registersOliver Cruickshank2019-09-061-0/+9
| | | | | | | Added support for VADD to use q and r registers, which reduces pressure on q registers. llvm-svn: 371230
* [ARM] Add patterns for VMUL with q and r registersOliver Cruickshank2019-09-061-0/+9
| | | | | | | Added support for VMUL to use an r register, this reduces pressure on the q registers. llvm-svn: 371229
* [ARM] Select vmlaSam Tebbs2019-09-031-0/+15
| | | | | | | | This patch adds vmla selection. Differential revision: https://reviews.llvm.org/D66297 llvm-svn: 370704
* [ARM] Use MQPR not QPR for MVE registersDavid Green2019-09-021-90/+90
| | | | | | | | | We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676
* [ARM] Remove MVE masked loads/storesDavid Green2019-09-011-82/+0
| | | | | | | | | These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607
* [ARM] MVE Masked loads and storesDavid Green2019-08-291-0/+82
| | | | | | | | | | | | | | | | | | | | | | | | Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. We also need to do something with unaligned loads/stores. Currently this uses a similar method used in big endian, using an VLDRB.8 (and potentially a VREV in BE). This does mean that the predicate mask is converted from, for example, a v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of the relevant mask lane, so this could potentially load different results if the predicate is little odd. As the input is a v4i1 however, I believe this is OK and all the bits required should be set in the predicate, making the VLDRB.8 load the same data. Differential Revision: https://reviews.llvm.org/D66534 llvm-svn: 370329
* [MVE] VMOVX patternsDavid Green2019-08-281-2/+6
| | | | | | | | | | | | | | | This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some adjustments for MVE. It allows us to move fp16 registers without going into and out of gprs. VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits of another register, zeroing the rest. This can be used for odd MVE register lanes. The top bits are not read by fp16 instructions, so no move is required there if we are dealing with even lanes. Differential revision: https://reviews.llvm.org/D66793 llvm-svn: 370184
* [ARM] Formatting for ARMInstrMVE.td. NFCDavid Green2019-08-211-89/+98
| | | | | | | This is just some formatting cleanup, prior to the masked load and store patch in D66534. llvm-svn: 369545
* [ARM] Select vaddvaSam Tebbs2019-08-201-0/+7
| | | | | | | | This patch adds vaddva selection. Differential revision: https://reviews.llvm.org/D66410 llvm-svn: 369404
* [ARM] Add support for MVE vaddvSam Tebbs2019-08-191-0/+6
| | | | | | | | | This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245
* [ARM] Fix alignment checks for BE VLDRHDavid Green2019-08-151-2/+2
| | | | | | | | | | | | We need to allow any alignment at least 2, not just exactly 2, so that the big endian loads and stores can be selected successfully. I've also added extra BE testing for the load and store tests. Thanks to Oliver for the report. Differential Revision: https://reviews.llvm.org/D66222 llvm-svn: 368996
* [ARM] MVE predicate store patternsDavid Green2019-08-151-0/+7
| | | | | | | | | Stack loads and stores were already working, but direct stores were not. This adds the patterns for them, same as predicate loads. Differential Revision: https://reviews.llvm.org/D66213 llvm-svn: 368988
* [ARM] MVE trunc to i1 vectorsDavid Green2019-08-151-0/+7
| | | | | | | | | This adds patterns for selecting trunc instructions from full vectors to i1's vectors. Differential Revision: https://reviews.llvm.org/D66201 llvm-svn: 368981
* [ARM] Add support for MVE pre and post inc loads and storesDavid Green2019-08-081-1/+71
| | | | | | | | | | | | This adds pre- and post- increment and decrements for MVE loads and stores. It uses the builtin pre and post load/store detection, unlike Neon. Loads are selected with the code in tryT2IndexedLoad, stores are selected with tablegen patterns. The immediates have a +/-7bit range, multiplied by the size of the element. Differential Revision: https://reviews.llvm.org/D63840 llvm-svn: 368305
* [ARM] MVE big endian loads/storesDavid Green2019-08-081-7/+35
| | | | | | | | | | | This adds some missing patterns for big endian loads/stores, allowing unaligned loads/stores to also be selected with an extra VREV, which produces better code than aligning through a stack. Also moves VLDR_P0 to not be LE only, and adjusts some of the tests to show all that working. Differential Revision: https://reviews.llvm.org/D65583 llvm-svn: 368304
* [ARM] Select VFMASam Tebbs2019-08-081-0/+7
| | | | llvm-svn: 368264
* [ARM] Tighten up VLDRH.32 with low alignmentsDavid Green2019-08-081-13/+25
| | | | | | | | | | | | VLDRH needs to have an alignment of at least 2, including the widening/narrowing versions. This tightens up the ISel patterns for it and alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded through the stack. It also fixed some incorrect shift amounts, which seemed to be passing a multiple not a shift. Differential Revision: https://reviews.llvm.org/D65580 llvm-svn: 368256
* [ARM] Generate MVE VHADDs/VHSUBsOliver Cruickshank2019-08-071-0/+54
| | | | llvm-svn: 368146
* [ARM] MVE big endian bitcastsDavid Green2019-08-041-0/+45
| | | | | | | | | | | | | | | | | | | | | | This adds big endian MVE patterns for bitcasts. They are defined in llvm as being the same as a store of the existing type and the load into the new. This means that they have to become a VREV between the two types, working in the same way that NEON works in big-endian. This also adds some example tests for bigendian, showing where code is and isn't different. The main difference, especially from a testing perspective is that vectors are passed as v2f64, and so are VREV into and out of call arguments, and the parameters are passed in a v2f64 format. Same happens for inline assembly where the register class is used, so it is VREV to a v16i8. So some of this is probably not correct yet, but it is (mostly) self-consistent and seems to be consistent with how llvm treats vectors. The rest we can hopefully fix later. More details about big endian neon can be found in https://llvm.org/docs/BigEndianNEON.html. Differential Revision: https://reviews.llvm.org/D65581 llvm-svn: 367780
* [ARM] Fix for MVE VREV64David Green2019-08-011-5/+5
| | | | | | | | | | | The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the cross-beat nature of the instruction. This adds an earlyclobber to Qd, which seems to be the same way we deal with this on other instructions like the write-back on loads and stores. Differential Revision: https://reviews.llvm.org/D65502 llvm-svn: 367544
* [ARM] Generate MVE VFMAsOliver Cruickshank2019-07-311-0/+21
| | | | llvm-svn: 367408
* [NFC] Test CommitOliver Cruickshank2019-07-311-2/+2
| | | | llvm-svn: 367405
* [ARM] MVE VPNOTDavid Green2019-07-281-3/+12
| | | | | | | | | | This adds the patterns required to transform xor P0, -1 to a VPNOT. The instruction operands have to change a little for this, adding an in and an out VCCR reg and using a custom DecodeMVEVPNOT for the decode. Differential Revision: https://reviews.llvm.org/D65133 llvm-svn: 367192
* [ARM] Better patterns for fp <> predicate vectorsDavid Green2019-07-281-0/+26
| | | | | | | | | | These are some better patterns for converting between predicates and floating points. Much like the extends, we select "1"/"-1" or "0" depending on the predicate value. Or we perform a compare against 0 to convert to a predicate. Differential Revision: https://reviews.llvm.org/D65103 llvm-svn: 367191
* [ARM] Rewrite how VCMP are lowered, using a single nodeDavid Green2019-07-241-62/+64
| | | | | | | | | | | | This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and VCMPZ with an extra operand as the condition code. I believe this will make some combines simpler, allowing us to just look at these codes and not the operands. It also helps fill in a missing VCGTUZ MVE selection without adding extra nodes for it. Differential Revision: https://reviews.llvm.org/D65072 llvm-svn: 366934
* [ARM] More MVE compare vector splat combines for ANDsDavid Green2019-07-241-0/+12
| | | | | | | | Adds some extra r register compare combines, this time for ANDs. Differential Revision: https://reviews.llvm.org/D65062 llvm-svn: 366928
* [ARM] MVE compare vector splat combineDavid Green2019-07-241-0/+12
| | | | | | | | | MVE VCMP instructions can use a general purpose register as the second operand. This adds the combines for it, selecting from a compare of a vdup. Differential Revision: https://reviews.llvm.org/D65061 llvm-svn: 366924
* [ARM] Better OR's for MVE comparesDavid Green2019-07-241-8/+12
| | | | | | | | | | | This adds a DeMorgan combine for OR's of compares to turn them into AND's, helping prevent them from going into and out of gpr registers. It also fills in the VCLE and VCLT nodes that MVE can select, allowing it to invert more compares. Differential Revision: https://reviews.llvm.org/D65059 llvm-svn: 366920
* [ARM] Better AND's for MVE comparesDavid Green2019-07-241-0/+24
| | | | | | | | | | | | Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where the second vcmp becomes predicated on the first. The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the VPTBlockPass. Differential Revision: https://reviews.llvm.org/D65058 llvm-svn: 366910
* [ARM] MVE floating point compares and selectsDavid Green2019-07-241-0/+40
| | | | | | | | | | | | | Much like integers, this adds MVE floating point compares and select. It requires a lot more buildvector/shuffle code because we may need to expand the compares without mve.fp, and requires support for and/or because of the way we lower llvm condition codes. Some original code by David Sherwood Differential Revision: https://reviews.llvm.org/D65054 llvm-svn: 366909
* [ARM] Basic And/Or/Xor handling for MVE predicatesDavid Green2019-07-241-0/+26
| | | | | | | | | | | This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It does this by going into and out of GPRs, doing the operation on scalars. Code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65053 llvm-svn: 366907
* [ARM] MVE predicate register supportDavid Green2019-07-241-0/+51
| | | | | | | | | | | | | | | | This adds support code for building and shuffling i1 predicate registers. It generally uses two basic principles, either converting the predicate into an scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or by converting the register to an full vector register and back. Some of the code here is a not super efficient but will hopefully cover most cases of moving i1 vectors around and can be improved in subsequent patches. Some code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65052 llvm-svn: 366890
* [ARM] MVE integer compares and selectsDavid Green2019-07-241-0/+43
| | | | | | | | | | | | | | | | | | | This adds the very basics for MVE vector predication, adding integer VCMP and VSEL instruction support. This is done through predicate registers (MVT::v16i1, MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc). An extra VCNE was added, as this can be handled sensibly by MVE's expanded number of VCMP condition codes. (There are also VCLE and VCLT which are added later). VPSEL is also added here, simply selecting on the vselect. Original code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65051 llvm-svn: 366885
OpenPOWER on IntegriCloud