summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] sext of a load is freeDavid Green2019-08-121-0/+21
| | | | | | | | | This teaches the cost model that the sext or zext of a load is going to be free. Differential Revision: https://reviews.llvm.org/D66006 llvm-svn: 368593
* [ARM] MVE shuffle broadcast costsDavid Green2019-08-121-0/+17
| | | | | | | | | | | A VDUP will perform a vector broadcast in a single instruction. Update the cost model for MVE accordingly. Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63448 llvm-svn: 368589
* [ARM] Put some of the TTI costmodel behind hasNeon calls.David Green2019-08-121-74/+75
| | | | | | | | | This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon() checks, now that we have MVE, and updates all the tests accordingly. Differential Revision: https://reviews.llvm.org/D63447 llvm-svn: 368587
* [MVE] Don't try to unroll vectorised MVE loopsDavid Green2019-08-111-0/+5
| | | | | | | | | | | | | | | Due to the nature of the beat system in the MVE architecture, along with tail predication and low-overhead loops, unrolling has less benefit compared to normal loops. You can not, for example, hide the latency of a load with other instructions as you can for scalar code. Preventing unrolling also makes the code easier to read and reason about. So if a loop contains vector code, don't enable the runtime unrolling. At least for the time being. Differential Revision: https://reviews.llvm.org/D65803 llvm-svn: 368530
* [ARM] Permit auto-vectorization using MVEDavid Green2019-08-111-2/+6
| | | | | | | | | | | | | With enough codegen complete, we can now correctly report the number and size of vector registers for MVE, allowing auto vectorisation. This also allows FP auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE FP arithmetic and parity between scalar and vector FP behaviour. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63728 llvm-svn: 368529
* [globalisel] Add G_SEXT_INREGDaniel Sanders2019-08-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487
* GlobalISel: pack various parameters for lowerCall into a struct.Tim Northover2019-08-092-21/+14
| | | | | | | | | I've now needed to add an extra parameter to this call twice recently. Not only is the signature getting extremely unwieldy, but just updating all of the callsites and implementations is a pain. Putting the parameters in a struct sidesteps both issues. llvm-svn: 368408
* [ARM][ParallelDSP] Replace SExt usesSam Parker2019-08-091-3/+5
| | | | | | | | | | | As loads are combined and widened, we replaced their sext users operands whereas we should have been replacing the uses of the sext. I've added a load of tests, with only a few of them originally causing assertion failures, the rest improve pattern coverage. Differential Revision: https://reviews.llvm.org/D65740 llvm-svn: 368404
* [ARM] Add support for MVE pre and post inc loads and storesDavid Green2019-08-083-19/+273
| | | | | | | | | | | | This adds pre- and post- increment and decrements for MVE loads and stores. It uses the builtin pre and post load/store detection, unlike Neon. Loads are selected with the code in tryT2IndexedLoad, stores are selected with tablegen patterns. The immediates have a +/-7bit range, multiplied by the size of the element. Differential Revision: https://reviews.llvm.org/D63840 llvm-svn: 368305
* [ARM] MVE big endian loads/storesDavid Green2019-08-082-43/+47
| | | | | | | | | | | This adds some missing patterns for big endian loads/stores, allowing unaligned loads/stores to also be selected with an extra VREV, which produces better code than aligning through a stack. Also moves VLDR_P0 to not be LE only, and adjusts some of the tests to show all that working. Differential Revision: https://reviews.llvm.org/D65583 llvm-svn: 368304
* [ARM] Select VFMASam Tebbs2019-08-081-0/+7
| | | | llvm-svn: 368264
* [ARM] Tighten up VLDRH.32 with low alignmentsDavid Green2019-08-082-16/+35
| | | | | | | | | | | | VLDRH needs to have an alignment of at least 2, including the widening/narrowing versions. This tightens up the ISel patterns for it and alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded through the stack. It also fixed some incorrect shift amounts, which seemed to be passing a multiple not a shift. Differential Revision: https://reviews.llvm.org/D65580 llvm-svn: 368256
* [ARM] Expand CTPOP intrinsic for MVEOliver Cruickshank2019-08-071-0/+1
| | | | llvm-svn: 368180
* [ARM] Generate MVE VHADDs/VHSUBsOliver Cruickshank2019-08-071-0/+54
| | | | llvm-svn: 368146
* [ARM][LowOverheadLoops] Revert after read/writeSam Parker2019-08-071-13/+18
| | | | | | | | | | | | | | | | | Currently we check whether LR is stored/loaded to/from inbetween the loop decrement and loop end pseudo instructions. There's two problems here: - It relies on all load/store instructions being labelled as such in tablegen. - Actually any use of loop decrement is troublesome because the value doesn't exist! So we need to check for any read/write of LR that occurs between the two instructions and revert if we find anything. Differential Revision: https://reviews.llvm.org/D65792 llvm-svn: 368130
* CodeGen: Migration to using RegisterMatt Arsenault2019-08-061-22/+22
| | | | llvm-svn: 367974
* [GlobalISel][CallLowering] Rename isArgumentHandler() -> ↵Amara Emerson2019-08-051-1/+1
| | | | | | | | | isIncomingArgumentHandler() Previous name and comment incorrectly implied it was just for formal arg handlers, which is not true. llvm-svn: 367945
* AMDGPU: Correct behavior of f16 buffer loadsMatt Arsenault2019-08-051-2/+3
| | | | | | | Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879
* [LLVM][Alignment] Introduce Alignment TypeGuillaume Chatelet2019-08-051-8/+8
| | | | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828
* Reland: Fix and test inter-procedural register allocation for ARMOliver Stannard2019-08-053-2/+10
| | | | | | | | | | | | | | | | | | | | Add an explicit construction of the ArrayRef, gcc 5 and earlier don't seem to select the ArrayRef constructor which takes a C array when the construction is implicit. Original commit message: - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367819
* [ARM] MVE big endian bitcastsDavid Green2019-08-041-0/+45
| | | | | | | | | | | | | | | | | | | | | | This adds big endian MVE patterns for bitcasts. They are defined in llvm as being the same as a store of the existing type and the load into the new. This means that they have to become a VREV between the two types, working in the same way that NEON works in big-endian. This also adds some example tests for bigendian, showing where code is and isn't different. The main difference, especially from a testing perspective is that vectors are passed as v2f64, and so are VREV into and out of call arguments, and the parameters are passed in a v2f64 format. Same happens for inline assembly where the register class is used, so it is VREV to a v16i8. So some of this is probably not correct yet, but it is (mostly) self-consistent and seems to be consistent with how llvm treats vectors. The rest we can hopefully fix later. More details about big endian neon can be found in https://llvm.org/docs/BigEndianNEON.html. Differential Revision: https://reviews.llvm.org/D65581 llvm-svn: 367780
* [Thumb] Fix invalid symbol redefinition due to duplicated jumptable (PR42760)Nikita Popov2019-08-031-1/+2
| | | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=42760. A tBR_JTr instruction is duplicated by tail duplication, which results in the same jumptable with the same label being emitted twice. Fix this by marking tBR_JTr as not duplicable. The corresponding ARM/Thumb instructions are already marked as not duplicable. Additionally also mark tTBB_JT and tTBH_JT to be consistent with Thumb2, even though this shouldn't be strictly necessary. Differential Revision: https://reviews.llvm.org/D65606 llvm-svn: 367753
* Emit diagnostic if an inline asm constraint requires an immediateBill Wendling2019-08-031-5/+6
| | | | | | | | | | | | | | | | | | Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750
* Revert Fix and test inter-procedural register allocation for ARMDouglas Yung2019-08-023-10/+2
| | | | | | | | This reverts r367669 (git commit f6b00c279a5587a25876752a6ecd8da0bed959dc) This was breaking a build bot http://lab.llvm.org:8011/builders/netbsd-amd64/builds/21233 llvm-svn: 367731
* GlobalISel: support swiftself attributeTim Northover2019-08-021-0/+1
| | | | llvm-svn: 367683
* [IPRA][ARM] Disable no-CSR optimisation for ARMOliver Stannard2019-08-021-0/+5
| | | | | | | | | | | This optimisation isn't generally profitable for ARM, because we can save/restore many registers in the prologue and epilogue using the PUSH and POP instructions, but mostly use individual LDR/STR instructions for other spills. Differential revision: https://reviews.llvm.org/D64910 llvm-svn: 367670
* Fix and test inter-procedural register allocation for ARMOliver Stannard2019-08-023-2/+10
| | | | | | | | | | | | | | - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367669
* [NFC][ARM[ParallelDSP] Rename/remove/change typesSam Parker2019-08-021-13/+8
| | | | | | | Remove forward declaration, fold a couple of typedefs and change one to be more useful. llvm-svn: 367665
* [NFC][ARM][ParallelDSP] Remove ValueListSam Parker2019-08-021-10/+8
| | | | | | We only care about the first element in the list. llvm-svn: 367660
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-0112-59/+56
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* [ARM] Fix for MVE VREV64David Green2019-08-011-5/+5
| | | | | | | | | | | The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the cross-beat nature of the instruction. This adds an earlyclobber to Qd, which seems to be the same way we deal with this on other instructions like the write-back on loads and stores. Differential Revision: https://reviews.llvm.org/D65502 llvm-svn: 367544
* [NFC][ARM][ParallelDSP] Getters and renamingSam Parker2019-08-011-16/+22
| | | | | | | Add a couple of getters for Reduction and do some renaming of variables around CreateSMLAD for clarity. llvm-svn: 367522
* [ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1.Eli Friedman2019-07-315-0/+34
| | | | | | | | | This is extremely specific, but saves three instructions when it's legal. I don't think the code can be usefully generalized. Differential Revision: https://reviews.llvm.org/D65351 llvm-svn: 367492
* [ARM] Transform compare of masked value to shift on Thumb1.Eli Friedman2019-07-311-0/+37
| | | | | | | | | | | | Thumb1 has very limited immediate modes, so turning an "and" into a shift can save multiple instructions. It's possible to simplify the generated code for test2 and test3 in cmp-and-fold.ll a little more, but I'll implement that as a followup. Differential Revision: https://reviews.llvm.org/D65175 llvm-svn: 367491
* [GISel] Address review feedback on passing MD_callees to lowerCall.Mark Lacey2019-07-311-1/+1
| | | | | | | Preserve the nullptr default for KnownCallees that appears in the base class. llvm-svn: 367477
* [GISel] Pass MD_callees metadata down in call lowering.Mark Lacey2019-07-312-2/+4
| | | | | | | | | | | | | | | | | | | | Summary: This will make it possible to improve IPRA by taking into account register usage in indirect calls. NFC yet; this is just laying the groundwork to start building up patches to take advantage of the information for improved register allocation. Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65488 llvm-svn: 367476
* [ARM] Reject CSEL instructions with invalid operandsMikhail Maltsev2019-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | Summary: According to the Armv8.1-M manual CSEL, CSINC, CSINV and CSNEG are "constrained unpredictable" when SP is used as the source register Rn. The assembler should diagnose this case. Reviewers: momchil.velikov, dmgreen, ostannard, simon_tatham, t.p.northover Reviewed By: ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65505 llvm-svn: 367433
* [ARM] Generate MVE VFMAsOliver Cruickshank2019-07-311-0/+21
| | | | llvm-svn: 367408
* [NFC] Test CommitOliver Cruickshank2019-07-311-2/+2
| | | | llvm-svn: 367405
* [NFC][ARMCGP] Use switch in isSupportedValueSam Parker2019-07-311-47/+40
| | | | | | | | Use a switch instead of many isa<> while checking for supported values. Also be explicit about which cast instructions are supported; This allows the removal of SIToFP from GenerateSignBits. llvm-svn: 367402
* [ARM][ParallelDSP] Convert to function passSam Parker2019-07-311-73/+45
| | | | | | | | Run across a whole function, visiting each basic block one at a time. Differential Revision: https://reviews.llvm.org/D65324 llvm-svn: 367389
* [ARM][LowOverheadLoops] Enable by defaultSam Parker2019-07-301-1/+1
| | | | | | | | | The code is now in a good enough state to pass the bunch of tests that I have run (after fixing the bugs), so let's enable it by default. Differential Revision: https://reviews.llvm.org/D65277 llvm-svn: 367297
* [ARM][LowOverheadLoops] Revert non-header LE targetSam Parker2019-07-301-3/+9
| | | | | | | | | Revert the hardware loop upon finding a LoopEnd that doesn't target the loop header, instead of asserting a failure. Differential Revision: https://reviews.llvm.org/D65268 llvm-svn: 367296
* [NFC][ARM[ParallelDSP] Cleanup of BinOpChainSam Parker2019-07-291-81/+58
| | | | | | | | | | | | - Remove some unused typedefs. - Rename BinOpChain struct to MulCandidate. - Remove the size method of MulCandidate. - Store only the first input of the ValueList provided to MulCandidate, as it's the only value we care about. This means we don't have to perform any ugly (and unnecessary) iterations of the list later on. llvm-svn: 367208
* [NFC][ARM][ParallelDSP] Remove AreSymmetricalSam Parker2019-07-291-43/+0
| | | | | | | We explicitly search for a parallel mac and we only care about its inputs, checking for symmetry doesn't add anything here. llvm-svn: 367205
* [NFC][ARM][ParallelDSP] Remove PopulateLoadsSam Parker2019-07-291-9/+0
| | | | | | | | We no longer have to check what loads are used, all this is performed at the start of the transform, so it's not doing anything now. llvm-svn: 367204
* [ARM] MVE VPNOTDavid Green2019-07-282-3/+22
| | | | | | | | | | This adds the patterns required to transform xor P0, -1 to a VPNOT. The instruction operands have to change a little for this, adding an in and an out VCCR reg and using a custom DecodeMVEVPNOT for the decode. Differential Revision: https://reviews.llvm.org/D65133 llvm-svn: 367192
* [ARM] Better patterns for fp <> predicate vectorsDavid Green2019-07-282-4/+26
| | | | | | | | | | These are some better patterns for converting between predicates and floating points. Much like the extends, we select "1"/"-1" or "0" depending on the predicate value. Or we perform a compare against 0 to convert to a predicate. Differential Revision: https://reviews.llvm.org/D65103 llvm-svn: 367191
* [ARM][ParallelDSP] Combine structsSam Parker2019-07-261-19/+15
| | | | | | | Combine OpChain and BinOpChain structs as OpChain is a base class to BinOpChain that is never used. llvm-svn: 367114
* [NFC][ARM][ParallelDSP] Cleanup isNarrowSequenceSam Parker2019-07-261-26/+5
| | | | | | Remove unused logic. llvm-svn: 367099
OpenPOWER on IntegriCloud