summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/Thumb2
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] Fix MVE ldst offset rangesDavid Green2019-09-032-64/+32
| | | | | | | | | | | | | | | We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to fold into MVE loads/stores. The instructions actually take a 7 bit unsigned integer which is either added or subtracted. So something more like isShiftedUInt<7, Shift>(abs(RHSC)). Instead I've changes this to use the isScaledConstantInRange method, same as in SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be getting this correct. Differential revision: https://reviews.llvm.org/D66997 llvm-svn: 370731
* [ARM] More MVE load/store tests for offsets around the negative limit. NFCDavid Green2019-09-033-0/+1264
| | | | llvm-svn: 370726
* [ARM] Select vmlaSam Tebbs2019-09-031-0/+80
| | | | | | | | This patch adds vmla selection. Differential revision: https://reviews.llvm.org/D66297 llvm-svn: 370704
* [ARM] MVE predicate bitcast test and VPSEL adjustment. NFCDavid Green2019-09-022-20/+193
| | | | llvm-svn: 370678
* [ARM] Use MQPR not QPR for MVE registersDavid Green2019-09-021-0/+113
| | | | | | | | | We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676
* [DAGCombiner] improve throughput of shift+logic+shiftSanjay Patel2019-09-011-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | The motivating case for this is a long way from here: https://bugs.llvm.org/show_bug.cgi?id=43146 ...but I think this is where we have to start. We need to canonicalize/optimize sequences of shift and logic to ease pattern matching for things like bswap and improve perf in general. But without the artificial limit of '!LegalTypes' (early combining), there are a lot of test diffs, and not all are good. In the minimal tests added for this proposal, x86 should have better throughput in all cases. AArch64 is neutral for scalar tests because it can fold shifts into bitwise logic ops. There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns: https://rise4fun.com/Alive/VlI https://rise4fun.com/Alive/n1m https://rise4fun.com/Alive/1Vn Differential Revision: https://reviews.llvm.org/D67021 llvm-svn: 370617
* [ARM] Remove MVE masked loads/storesDavid Green2019-09-013-440/+5297
| | | | | | | | | These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607
* [Thumb2] tighten CHECK lines in test; NFCSanjay Patel2019-08-301-2/+4
| | | | | | | | | The sequence between the function call and the asm start may change without affecting what this test is looking for, but we should have a better idea about what that sequence looks like. llvm-svn: 370518
* [ARM] MVE Masked loads and storesDavid Green2019-08-293-5297/+440
| | | | | | | | | | | | | | | | | | | | | | | | Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. We also need to do something with unaligned loads/stores. Currently this uses a similar method used in big endian, using an VLDRB.8 (and potentially a VREV in BE). This does mean that the predicate mask is converted from, for example, a v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of the relevant mask lane, so this could potentially load different results if the predicate is little odd. As the input is a v4i1 however, I believe this is OK and all the bits required should be set in the predicate, making the VLDRB.8 load the same data. Differential Revision: https://reviews.llvm.org/D66534 llvm-svn: 370329
* [ARM] Masked load and store and predicate tests. NFCDavid Green2019-08-2913-50/+7606
| | | | llvm-svn: 370325
* [MVE] VMOVX patternsDavid Green2019-08-2813-4296/+2115
| | | | | | | | | | | | | | | This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some adjustments for MVE. It allows us to move fp16 registers without going into and out of gprs. VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits of another register, zeroing the rest. This can be used for odd MVE register lanes. The top bits are not read by fp16 instructions, so no move is required there if we are dealing with even lanes. Differential revision: https://reviews.llvm.org/D66793 llvm-svn: 370184
* Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32Sam Tebbs2019-08-224-58/+42
| | | | | | | | | The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the changes from the above patch. This reverts commit cd53ff6, reapplying 7c6b229. llvm-svn: 369638
* Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32"Hans Wennborg2019-08-223-35/+47
| | | | | | | | | | | | | | | | | It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/ > This patch fixes shifts by a 128/256 bit shift amount. It also fixes > codegen for shifts of 32 by delegating to LLVM's default optimisation > instead of emitting a long shift. > > Tests that used to generate long shifts of 32 are updated to check for the > more optimised codegen. > > Differential revision: https://reviews.llvm.org/D66519 > > llvm-svn: 369626 llvm-svn: 369636
* [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32Sam Tebbs2019-08-223-47/+35
| | | | | | | | | | | | | This patch fixes shifts by a 128/256 bit shift amount. It also fixes codegen for shifts of 32 by delegating to LLVM's default optimisation instead of emitting a long shift. Tests that used to generate long shifts of 32 are updated to check for the more optimised codegen. Differential revision: https://reviews.llvm.org/D66519 llvm-svn: 369626
* [ARM] Select vaddvaSam Tebbs2019-08-201-2/+77
| | | | | | | | This patch adds vaddva selection. Differential revision: https://reviews.llvm.org/D66410 llvm-svn: 369404
* [ARM] Add support for MVE vaddvSam Tebbs2019-08-191-0/+34
| | | | | | | | | This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245
* [ARM] Correct register for narrowing and widening MVE loads and stores.David Green2019-08-161-0/+281
| | | | | | | | | | | | | | | | The widening and narrowing MVE instructions like VLDRH.32 are only permitted to use low tGPR registers. This means that if they are used for a stack slot, where the register used is only decided during frame setup, we need to be able to correctly pick a thumb1 register over a normal GPR. This attempts to add the required logic into eliminateFrameIndex and rewriteT2FrameIndex, only picking the FrameReg if it is a valid register for the operands register class, and picking a valid scratch register for the register class. Differential Revision: https://reviews.llvm.org/D66285 llvm-svn: 369108
* [ARM][LowOverheadLoops] Fix generated code for "revert".Eli Friedman2019-08-153-4/+4
| | | | | | | | | | | | | | | | | | Two issues: 1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't really have any visible effect at the moment, but it might matter in the future. 2. The t2CMPri generated for t2WhileLoopStart might need to use a register that isn't LR. My team found this because we have a patch to track register liveness late in the pass pipeline. I'll look into upstreaming it to help catch issues like this earlier. Differential Revision: https://reviews.llvm.org/D66243 llvm-svn: 369069
* [ARM] Fix alignment checks for BE VLDRHDavid Green2019-08-153-225/+666
| | | | | | | | | | | | We need to allow any alignment at least 2, not just exactly 2, so that the big endian loads and stores can be selected successfully. I've also added extra BE testing for the load and store tests. Thanks to Oliver for the report. Differential Revision: https://reviews.llvm.org/D66222 llvm-svn: 368996
* [ARM] MVE predicate store patternsDavid Green2019-08-151-0/+205
| | | | | | | | | Stack loads and stores were already working, but direct stores were not. This adds the patterns for them, same as predicate loads. Differential Revision: https://reviews.llvm.org/D66213 llvm-svn: 368988
* [ARM] MVE trunc to i1 vectorsDavid Green2019-08-151-0/+61
| | | | | | | | | This adds patterns for selecting trunc instructions from full vectors to i1's vectors. Differential Revision: https://reviews.llvm.org/D66201 llvm-svn: 368981
* [ARM] MVE spill vector test. NFCDavid Green2019-08-111-0/+163
| | | | llvm-svn: 368531
* [ARM] Add support for MVE pre and post inc loads and storesDavid Green2019-08-083-256/+130
| | | | | | | | | | | | This adds pre- and post- increment and decrements for MVE loads and stores. It uses the builtin pre and post load/store detection, unlike Neon. Loads are selected with the code in tryT2IndexedLoad, stores are selected with tablegen patterns. The immediates have a +/-7bit range, multiplied by the size of the element. Differential Revision: https://reviews.llvm.org/D63840 llvm-svn: 368305
* [ARM] MVE big endian loads/storesDavid Green2019-08-084-179/+349
| | | | | | | | | | | This adds some missing patterns for big endian loads/stores, allowing unaligned loads/stores to also be selected with an extra VREV, which produces better code than aligning through a stack. Also moves VLDR_P0 to not be LE only, and adjusts some of the tests to show all that working. Differential Revision: https://reviews.llvm.org/D65583 llvm-svn: 368304
* [ARM] Select VFMASam Tebbs2019-08-081-0/+24
| | | | llvm-svn: 368264
* [ARM] Tighten up VLDRH.32 with low alignmentsDavid Green2019-08-084-47/+132
| | | | | | | | | | | | VLDRH needs to have an alignment of at least 2, including the widening/narrowing versions. This tightens up the ISel patterns for it and alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded through the stack. It also fixed some incorrect shift amounts, which seemed to be passing a multiple not a shift. Differential Revision: https://reviews.llvm.org/D65580 llvm-svn: 368256
* [ARM] Rejig MVE load store tests. NFCDavid Green2019-08-085-1053/+1529
| | | | | | | This adjusts the load/store tests for better testing of alignments. It also adds some extra alignment 1 tests, useful for future commits. llvm-svn: 368255
* [ARM] Expand CTPOP intrinsic for MVEOliver Cruickshank2019-08-071-0/+151
| | | | llvm-svn: 368180
* [ARM] Generate MVE VHADDs/VHSUBsOliver Cruickshank2019-08-071-0/+281
| | | | llvm-svn: 368146
* [ARM][LowOverheadLoops] Revert after read/writeSam Parker2019-08-072-0/+256
| | | | | | | | | | | | | | | | | Currently we check whether LR is stored/loaded to/from inbetween the loop decrement and loop end pseudo instructions. There's two problems here: - It relies on all load/store instructions being labelled as such in tablegen. - Actually any use of loop decrement is troublesome because the value doesn't exist! So we need to check for any read/write of LR that occurs between the two instructions and revert if we find anything. Differential Revision: https://reviews.llvm.org/D65792 llvm-svn: 368130
* [ARM] MVE big endian bitcastsDavid Green2019-08-041-0/+330
| | | | | | | | | | | | | | | | | | | | | | This adds big endian MVE patterns for bitcasts. They are defined in llvm as being the same as a store of the existing type and the load into the new. This means that they have to become a VREV between the two types, working in the same way that NEON works in big-endian. This also adds some example tests for bigendian, showing where code is and isn't different. The main difference, especially from a testing perspective is that vectors are passed as v2f64, and so are VREV into and out of call arguments, and the parameters are passed in a v2f64 format. Same happens for inline assembly where the register class is used, so it is VREV to a v16i8. So some of this is probably not correct yet, but it is (mostly) self-consistent and seems to be consistent with how llvm treats vectors. The rest we can hopefully fix later. More details about big endian neon can be found in https://llvm.org/docs/BigEndianNEON.html. Differential Revision: https://reviews.llvm.org/D65581 llvm-svn: 367780
* [ARM] Fix for MVE VREV64David Green2019-08-011-5/+10
| | | | | | | | | | | The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the cross-beat nature of the instruction. This adds an earlyclobber to Qd, which seems to be the same way we deal with this on other instructions like the write-back on loads and stores. Differential Revision: https://reviews.llvm.org/D65502 llvm-svn: 367544
* [ARM] Generate MVE VFMAsOliver Cruickshank2019-07-311-0/+370
| | | | llvm-svn: 367408
* [ARM][LowOverheadLoops] Revert non-header LE targetSam Parker2019-07-301-0/+255
| | | | | | | | | Revert the hardware loop upon finding a LoopEnd that doesn't target the loop header, instead of asserting a failure. Differential Revision: https://reviews.llvm.org/D65268 llvm-svn: 367296
* [ARM] MVE VPNOTDavid Green2019-07-284-212/+53
| | | | | | | | | | This adds the patterns required to transform xor P0, -1 to a VPNOT. The instruction operands have to change a little for this, adding an in and an out VCCR reg and using a custom DecodeMVEVPNOT for the decode. Differential Revision: https://reviews.llvm.org/D65133 llvm-svn: 367192
* [ARM] Better patterns for fp <> predicate vectorsDavid Green2019-07-281-64/+67
| | | | | | | | | | These are some better patterns for converting between predicates and floating points. Much like the extends, we select "1"/"-1" or "0" depending on the predicate value. Or we perform a compare against 0 to convert to a predicate. Differential Revision: https://reviews.llvm.org/D65103 llvm-svn: 367191
* Regenerate UXTB testsSimon Pilgrim2019-07-271-51/+112
| | | | llvm-svn: 367179
* [ARM][LowOverheadLoops] Add CPSR defsSam Parker2019-07-2612-496/+417
| | | | | | | | | | Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair, so add an implicit def to these pseudo instructions in case that WLS and LE aren't generated. Differential Revision: https://reviews.llvm.org/D65275 llvm-svn: 367089
* [ARM] Rewrite how VCMP are lowered, using a single nodeDavid Green2019-07-244-12/+6
| | | | | | | | | | | | This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and VCMPZ with an extra operand as the condition code. I believe this will make some combines simpler, allowing us to just look at these codes and not the operands. It also helps fill in a missing VCGTUZ MVE selection without adding extra nodes for it. Differential Revision: https://reviews.llvm.org/D65072 llvm-svn: 366934
* [ARM] Disable MVE fptosi and friendsDavid Green2019-07-241-50/+161
| | | | | | | | | | The prevents us from trying to convert an i1 predicate vector to a float, or vice-versa. Better patterns are possible, which will follow in a subsequent commit. For now we just expand them. Differential Revision: https://reviews.llvm.org/D65066 llvm-svn: 366931
* [ARM] More MVE compare vector splat combines for ANDsDavid Green2019-07-242-9/+285
| | | | | | | | Adds some extra r register compare combines, this time for ANDs. Differential Revision: https://reviews.llvm.org/D65062 llvm-svn: 366928
* [ARM] MVE compare vector splat combineDavid Green2019-07-242-0/+3958
| | | | | | | | | MVE VCMP instructions can use a general purpose register as the second operand. This adds the combines for it, selecting from a compare of a vdup. Differential Revision: https://reviews.llvm.org/D65061 llvm-svn: 366924
* [ARM] Better OR's for MVE comparesDavid Green2019-07-243-164/+149
| | | | | | | | | | | This adds a DeMorgan combine for OR's of compares to turn them into AND's, helping prevent them from going into and out of gpr registers. It also fills in the VCLE and VCLT nodes that MVE can select, allowing it to invert more compares. Differential Revision: https://reviews.llvm.org/D65059 llvm-svn: 366920
* [ARM] Better AND's for MVE comparesDavid Green2019-07-241-96/+39
| | | | | | | | | | | | Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where the second vcmp becomes predicated on the first. The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the VPTBlockPass. Differential Revision: https://reviews.llvm.org/D65058 llvm-svn: 366910
* [ARM] MVE floating point compares and selectsDavid Green2019-07-243-0/+6690
| | | | | | | | | | | | | Much like integers, this adds MVE floating point compares and select. It requires a lot more buildvector/shuffle code because we may need to expand the compares without mve.fp, and requires support for and/or because of the way we lower llvm condition codes. Some original code by David Sherwood Differential Revision: https://reviews.llvm.org/D65054 llvm-svn: 366909
* [ARM] Basic And/Or/Xor handling for MVE predicatesDavid Green2019-07-244-0/+2064
| | | | | | | | | | | This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It does this by going into and out of GPRs, doing the operation on scalars. Code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65053 llvm-svn: 366907
* [ARM] MVE predicate register supportDavid Green2019-07-246-0/+1398
| | | | | | | | | | | | | | | | This adds support code for building and shuffling i1 predicate registers. It generally uses two basic principles, either converting the predicate into an scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or by converting the register to an full vector register and back. Some of the code here is a not super efficient but will hopefully cover most cases of moving i1 vectors around and can be improved in subsequent patches. Some code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65052 llvm-svn: 366890
* [ARM] MVE integer compares and selectsDavid Green2019-07-243-0/+926
| | | | | | | | | | | | | | | | | | | This adds the very basics for MVE vector predication, adding integer VCMP and VSEL instruction support. This is done through predicate registers (MVT::v16i1, MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc). An extra VCNE was added, as this can be handled sensibly by MVE's expanded number of VCMP condition codes. (There are also VCLE and VCLT which are added later). VPSEL is also added here, simply selecting on the vselect. Original code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65051 llvm-svn: 366885
* [ARM][LowOverheadLoops] Fix branch target codegenSam Parker2019-07-231-0/+513
| | | | | | | | | | | | | | | | While lowering test.set.loop.iterations, it wasn't checked how the brcond was using the result and so the wls could branch to the loop preheader instead of not entering it. The same was true for loop.decrement.reg. So brcond and br_cc and now lowered manually when using the hwloop intrinsics. During this we now check whether the result has been negated and whether we're using SETEQ or SETNE and 0 or 1. We can then figure out which basic block the WLS and LE should be targeting. Differential Revision: https://reviews.llvm.org/D64616 llvm-svn: 366809
* [ARM][LowOverheadLoops] Revert remaining pseudosSam Parker2019-07-221-0/+170
| | | | | | | | | | | ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, iterate through all the instructions of the function and revert any remaining pseudo instructions that haven't been converted. Differential Revision: https://reviews.llvm.org/D65080 llvm-svn: 366691
OpenPOWER on IntegriCloud