summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/Thumb2
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] VQSUB instructionDavid Green2019-10-101-51/+6
| | | | | | | | Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68567 llvm-svn: 374377
* [ARM] VQADD instructionsDavid Green2019-10-101-54/+6
| | | | | | | | | This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336
* [ARM] Add saturating arithmetic tests for MVE. NFCDavid Green2019-10-091-0/+501
| | | | llvm-svn: 374159
* [ARM] Generate vcmp instead of vcmpeKristof Beyls2019-10-084-380/+380
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on the discussion in http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the conclusion was reached that the ARM backend should produce vcmp instead of vcmpe instructions by default, i.e. not be producing an Invalid Operation exception when either arguments in a floating point compare are quiet NaNs. In the future, after constrained floating point intrinsics for floating point compare have been introduced, vcmpe instructions probably should be produced for those intrinsics - depending on the exact semantics they'll be defined to have. This patch logically consists of the following parts: - Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which implemented fine-tuning for when to produce vcmpe (i.e. not do it for equality comparisons). The complexity introduced by those patches isn't needed anymore if we just always produce vcmp instead. Maybe these patches need to be reintroduced again once support is needed to map potential LLVM-IR constrained floating point compare intrinsics to the ARM instruction set. - Simply select vcmp, instead of vcmpe, see simple changes in lib/Target/ARM/ARMInstrVFP.td - Adapt lots of tests that tested for vcmpe (instead of vcmp). For all of these test, the intent of what is tested for isn't related to whether the vcmp should produce an Invalid Operation exception or not. Fixes PR43374. Differential Revision: https://reviews.llvm.org/D68463 llvm-svn: 374025
* ARM-Darwin: keep the frame register reserved even if not updated.Tim Northover2019-10-041-1/+1
| | | | | | | | Darwin platforms need the frame register to always point at a valid record even if it's not updated in a leaf function. Backtraces are more important than one extra GPR. llvm-svn: 373738
* [ARM] Identity shuffles are legalDavid Green2019-10-021-52/+4
| | | | | | | | | | | | | | | Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE (they essentially just become bitcasts). We were not catching that in the existing set of what we considered legal though. On NEON, they would be covered by vext's, but that is not generally available in MVE. This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here but does what we want and prevents us from just rewriting what is the same function. Differential Revision: https://reviews.llvm.org/D68241 llvm-svn: 373446
* [ARM] Some MVE shuffle plus extend tests. NFCDavid Green2019-10-011-0/+142
| | | | llvm-svn: 373368
* [NFC][ARM][MVE] More testsSam Parker2019-10-011-0/+602
| | | | | | Add some tail predication tests with fast math. llvm-svn: 373331
* [NFC][ARM][MVE] More testsSam Parker2019-09-301-0/+2014
| | | | | | Add some loop tests that cover different float operations and types. llvm-svn: 373192
* [ARM][MVE] Change VCTP operandSam Parker2019-09-308-46/+63
| | | | | | | | | | | | The VCTP instruction will calculate the predicate masked based upon the number of elements that need to be processed. I had inserted the sub before the vctp intrinsic and supplied it as the operand, but this is incorrect as the phi should directly feed the vctp. The sub is calculating the value for the next iteration. Differential Revision: https://reviews.llvm.org/D67921 llvm-svn: 373188
* [NFC][ARM] Add some tail-predication testsSam Parker2019-09-271-0/+1757
| | | | | | Use different data types for some simple loops. llvm-svn: 373064
* [ARM] Ensure we do not attempt to create lsll #0David Green2019-09-251-0/+48
| | | | | | | | | | | During legalisation we can end up with some pretty strange nodes, like shifts of 0. We need to make sure we don't try to make long shifts of these, ending up with invalid assembly instructions. A long shift with a zero immediate actually encodes a shift by 32. Differential Revision: https://reviews.llvm.org/D67664 llvm-svn: 372839
* [ARM] Split large widening MVE loadsDavid Green2019-09-241-301/+46
| | | | | | | | | | | | Similar to rL372717, we can force the splitting of extends of vector loads in MVE, in order to use the better widening loads as opposed to going through expensive extends. This adds a combine to early-on detect extends of loads and split the load in two, from where normal legalisation will kick in and we get a series of widening loads. Differential Revision: https://reviews.llvm.org/D67909 llvm-svn: 372721
* [ARM] MVE sext and widen/narrow tests from larger types. NFCDavid Green2019-09-242-2/+859
| | | | llvm-svn: 372719
* [ARM] Split large truncating MVE storesDavid Green2019-09-242-27/+9
| | | | | | | | | | | | | | | | | | | | | MVE does not have a simple sign extend instruction that can move elements across lanes. We currently often end up moving each lane into and out of a GPR, in order to get elements into the correct places. When we have a store of a trunc (or a extend of a load), we can instead just split the store/load in two, using the narrowing/widening load/store instructions from each half of the vector. This does that for stores. It happens very early in a store combine, so as to easily detect the truncates. (It would be possible to do this later, but that would involve looking through a buildvector of extract elements. Not impossible but this way seemed simpler). By enabling store combines we also get a vmovdrr combine for free, helping some other tests. Differential Revision: https://reviews.llvm.org/D67828 llvm-svn: 372717
* [ARM][MVE] Remove old tail predicatesSam Parker2019-09-234-3/+612
| | | | | | | | | | | Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also look into the exit block to see if there's any duplicates of the predicates that we've replaced and clone the vctp to be used there instead. Differential Revision: https://reviews.llvm.org/D67709 llvm-svn: 372567
* [ARM][LowOverheadLoops] Use subs during revert.Sam Parker2019-09-234-11/+10
| | | | | | | | | | Check whether there are any uses or defs between the LoopDec and LoopEnd. If there's not, then we can use a subs to set the cpsr and skip generating a cmp. Differential Revision: https://reviews.llvm.org/D67801 llvm-svn: 372560
* [ARM][LowOverheadLoops] Use tBcc when revertingSam Parker2019-09-234-5/+5
| | | | | | | | | Check the branch target ranges and use a tBcc instead of t2Bcc when we can. Differential Revision: https://reviews.llvm.org/D67796 llvm-svn: 372557
* [ARM] Fix CTTZ not generating correct instructions MVEOliver Cruickshank2019-09-201-30/+12
| | | | | | CTTZ intrinsic should have been set to Custom, not Expand llvm-svn: 372401
* [ARM] MVE i1 splatDavid Green2019-09-191-34/+3
| | | | | | | | | We needn't BFI each lane individually into a predicate register when each lane in the same. A simple sign extend and a vmsr will do. Differential Revision: https://reviews.llvm.org/D67653 llvm-svn: 372313
* [ARM] Fix for buildbotsSam Parker2019-09-191-26/+24
| | | | | | I had missed that massive.mir also needed updating. llvm-svn: 372303
* [ARM] Add a SelectTAddrModeImm7 for MVE narrow loads and storesDavid Green2019-09-171-25/+22
| | | | | | | | | | | | We were previously using the SelectT2AddrModeImm7 for both normal and narrowing MVE loads/stores. As the narrowing instructions do not accept sp as a register, it makes little sense to optimise a FrameIndex into the load, only to have to recover that later on. This adds a SelectTAddrModeImm7 which does not do that folding, and uses it for narrowing load/store patterns. Differential Revision: https://reviews.llvm.org/D67489 llvm-svn: 372134
* [ARM] Reserve an emergency spill slot for fp16 addressing modes that need itDavid Green2019-09-171-0/+95
| | | | | | | | | | | Similar to D67327, but this time for the FP16 VLDR and VSTR instructions that use the AddrMode5FP16 addressing mode. We need to reserve an emergency spill slot for instructions that will be out of range to use sp directly. AddrMode5FP16 is 8 bits with a scale of 2. Differential Revision: https://reviews.llvm.org/D67483 llvm-svn: 372132
* [ARM] Fix for buildbotsSam Parker2019-09-171-1/+1
| | | | | | | | Remove setPreservesCFG from ARMConstantIslandPass and add a couple of -verify-machine-dom-info instances into the existing codegen tests. llvm-svn: 372126
* [ARM] Fix for buildbotsSam Parker2019-09-174-51/+97
| | | | | | | Add --verifymachineinstrs and update the remaining low overhead loop tests. llvm-svn: 372121
* [ARM] Fix for MVE load/store stack accessesDavid Green2019-09-171-0/+185
| | | | | | | | | | MVE loads and stores have a 7 bit immediate range, scaled by the length of the type. This needs to be taught to the stack estimation code to ensure that an emergency spill slot is reserved in case we run out of registers when materialising stack indices. Also the narrowing loads/stores can be created with frame indices even though they do not accept SP as a register. We need in those cases to make sure we have an emergency register to use as the frame base, as SP can never be used. Differential Revision: https://reviews.llvm.org/D67327 llvm-svn: 372114
* [ARM][LowOverheadLoops] Add LR def safety checkSam Parker2019-09-1714-261/+599
| | | | | | | | | | | | | | | Converting the *LoopStart pseudo instructions into DLS/WLS results in LR being defined. These instructions were inserted on the assumption that LR would already contain the loop counter because a mov is introduced during ISel as the the consumers in the loop can only use LR. That assumption proved wrong! So perform a safety check, finding an appropriate place to insert the DLS/WLS instructions or revert if this isn't possible. Differential Revision: https://reviews.llvm.org/D67539 llvm-svn: 372111
* [ARM] LE support in ConstantIslandsSam Parker2019-09-174-0/+744
| | | | | | | | | | | | | The low-overhead branch extension provides a loop-end 'LE' instruction that performs no decrement nor compare, it just jumps backwards. This patch modifies the constant islands pass to try to insert LE instructions in place of a Thumb2 conditional branch, instead of shrinking it. This only happens if a cmp can be converted to a cbn/z and used to exit the loop. Differential Revision: https://reviews.llvm.org/D67404 llvm-svn: 372085
* [ARM] A predicate cast of a predicate cast is a predicate castDavid Green2019-09-163-199/+201
| | | | | | | | | | | The adds some very basic folding of PREDICATE_CASTS, removing cases when they are chained together. These would already be removed eventually, as these are lowered to copies. This just allows it to happen earlier, which can help other simplifications. Differential Revision: https://reviews.llvm.org/D67591 llvm-svn: 372012
* [ARM] Add patterns for BSWAP intrinsic on MVEOliver Cruickshank2019-09-161-0/+37
| | | | | | | BSWAP can use the VREV instruction on MVE to produce better results than expanding. llvm-svn: 372002
* [ARM] Add patterns for bitreverse intrinsic on MVEOliver Cruickshank2019-09-161-0/+52
| | | | | | | BITREVERSE can use the VBRSR which will reverse and right shift. Shifting right by 0 will just reverse the bits. llvm-svn: 372001
* [ARM] Lower CTTZ on MVEOliver Cruickshank2019-09-161-0/+178
| | | | | | | Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and count the leading zeros, equivalent to a count trailing zeros (CTTZ). llvm-svn: 372000
* [ARM] Add patterns for CTLZ on MVEOliver Cruickshank2019-09-161-0/+140
| | | | | | | CTLZ intrinsic can use the VCLS instruction on MVE, which produces better results than expanding. llvm-svn: 371999
* [ARM] Fold VCMP into VPTDavid Green2019-09-1618-446/+265
| | | | | | | | | | | | | | | MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in a single instruction, performing the compare and starting the VPT block in one. This teaches the MVEVPTBlockPass to fold them, searching back through the basicblock for a valid VCMP and creating the VPT from its operands. There are some changes to the VPT instructions to accommodate this, altering the order of the operands to match the VCMP better, and changing P0 register defs to be VPR defs, as is used in other places. Differential Revision: https://reviews.llvm.org/D66577 llvm-svn: 371982
* [ARM] Masked loads and storesDavid Green2019-09-153-7037/+470
| | | | | | | | | | | | | | | | Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932
* [ARM] Simplify and update vmla test. NFCDavid Green2019-09-152-39/+38
| | | | llvm-svn: 371930
* [ARM] Add support for MVE vmaxv and vminvSam Tebbs2019-09-131-0/+135
| | | | | | | | This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv. Differential Revision: https://reviews.llvm.org/D66413 llvm-svn: 371827
* [Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes ↵Guillaume Chatelet2019-09-1126-31/+31
| | | | | | | | | | | | | | | | | | | | | | mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608
* [ARM] Fix loads and stores for predicate vectorsDavid Green2019-09-095-757/+3026
| | | | | | | | | | | | | | | | | | | | | | | | These predicate vectors can usually be loaded and stored with a single instruction, a VSTR_P0. However this instruction will store the entire P0 predicate, 16 bits, zeroextended to 32bits. Each lane of the the v4i1/v8i1/v16i1 representing 4/2/1 bits. As far as I understand, when llvm says "store this v4i1", it really does need to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a store followed by a load, which is how the code is expanded. So this instead lowers the v4i1/v8i1 load/store through some shuffles to get the bits into the correct positions. This, as you might imagine, is not as efficient as a single instruction. But I believe it is needed for correctness. v16i1 equally should not load/store 32bits, only storing the 16bits of data. Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not changing). This is fine as they are self-consistent, it is only "externally observable loads/stores" (from our point of view) that need to be corrected. Differential revision: https://reviews.llvm.org/D67085 llvm-svn: 371419
* [ARM][MVE] VCTP instruction selectionSam Parker2019-09-091-0/+54
| | | | | | | | Add codegen support for vctp{8,16,32}. Differential Revision: https://reviews.llvm.org/D67344 llvm-svn: 371395
* [ARM] Add patterns for VSUB with q and r registersOliver Cruickshank2019-09-061-0/+77
| | | | | | | Added patterns for VSUB to support q and r registers, which reduces pressure on q registers. llvm-svn: 371231
* [ARM] Add patterns for VADD with q and r registersOliver Cruickshank2019-09-061-0/+74
| | | | | | | Added support for VADD to use q and r registers, which reduces pressure on q registers. llvm-svn: 371230
* [ARM] Add patterns for VMUL with q and r registersOliver Cruickshank2019-09-061-0/+74
| | | | | | | Added support for VMUL to use an r register, this reduces pressure on the q registers. llvm-svn: 371229
* [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selectionSam Tebbs2019-09-061-0/+122
| | | | | | | | | | | | This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers. Loop tests have been added to the vmla test file to make sure vmlas are generated in loops. Differential revision: https://reviews.llvm.org/D66295 llvm-svn: 371218
* [ARM] MVE Tail PredicationSam Parker2019-09-067-0/+1505
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179
* [ARM] Fixup the creation of VPT blocksDavid Green2019-09-058-36/+34
| | | | | | | | | | This attempts to just fix the creation of VPT blocks, fixing up the iterating, which instructions are considered in the bundle, and making sure that we do not overrun the end of the block. Differential Revision: https://reviews.llvm.org/D67219 llvm-svn: 371064
* [ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operandsDavid Green2019-09-0318-668/+668
| | | | | | | | | | | | | | The code here seems to date back to r134705, when tablegen lowering was first being added. I don't believe that we need to include CPSR implicit operands on the MCInst. This now works more like other backends (like AArch64), where all implicit registers are skipped. This allows the AliasInst for CSEL's to match correctly, as can be seen in the test changes. Differential revision: https://reviews.llvm.org/D66703 llvm-svn: 370745
* [ARM] Invert CSEL predicates if the opposite is a simpler constant to ↵David Green2019-09-031-4/+4
| | | | | | | | | | | | | | | | | | | | materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741
* [ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions.David Green2019-09-0318-2649/+1989
| | | | | | | | | | | | Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739
* [ARM] Add csel tests. NFCDavid Green2019-09-031-0/+390
| | | | llvm-svn: 370738
OpenPOWER on IntegriCloud