summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Relax constraints on operands of VQxDMLxDH instructionsMikhail Maltsev2019-07-082-17/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: According to a recently updated Armv8-M spec (https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf) the 32-bit width versions of the following instructions: * VQDMLADH * VQDMLADHX * VQRDMLADH * VQRDMLADHX * VQDMLSDH * VQDMLSDHX * VQRDMLSDH * VQRDMLSDHX are no longer unpredictable when their output register is the same as one of the input registers. This patch updates the assembler parser and the corresponding tests and also removes @earlyclobber from the instruction constraints. Reviewers: simon_tatham, ostannard, dmgreen, SjoerdMeijer, samparker Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64250 llvm-svn: 365306
* [ARM] Fix null pointer dereference in ↵Fangrui Song2019-07-081-6/+8
| | | | | | | | CodeGen/ARM/Windows/stack-protector-msvc.ll.test after D64292/r365283 CLI.CS may not be set. llvm-svn: 365299
* [ARM] Add support for MSVC stack cookie checkingMartin Storsjo2019-07-072-0/+34
| | | | | | | | Heavily based on the same for AArch64, from SVN r346469. Differential Revision: https://reviews.llvm.org/D64292 llvm-svn: 365283
* [ARM] MVE patterns for VMVN, VORR and VBICDavid Green2019-07-051-0/+23
| | | | | | | | This add simple Q register forms of bitwise not instructions. Differential Revision: https://reviews.llvm.org/D63983 llvm-svn: 365214
* [ARM] MVE VMOV immediate handlingDavid Green2019-07-055-30/+57
| | | | | | | | | | | | | This adds some handling for VMOVimm, using the same method that NEON uses. We create VMOVIMM/VMVNIMM/VMOVFPIMM nodes based on the immediate, and select them using the now renamed ARMvmovImm/etc. There is also an extra 64bit immediate mode that I have not yet added here. Code by David Sherwood Differential Revision: https://reviews.llvm.org/D63884 llvm-svn: 365178
* [ARM] MVE fp to int conversionsDavid Green2019-07-052-0/+26
| | | | | | | | This adds the patterns needed for fptosi and sitofp. Differential Revision: https://reviews.llvm.org/D63729 llvm-svn: 365176
* [ARM] Favour PL/MI over GE/LT when possibleDavid Green2019-07-041-0/+19
| | | | | | | | | | | | | | | The arm condition codes for GE is N==V (and for LT is N!=V). If the source of flags cannot set V (overflow), such as a cmp against #0, then we can use the simpler PL and MI conditions that only check N. As these PL/MI conditions are simpler than GE/LT, other passes like the peephole optimiser can have a better time optimising away the redundant CMPs. The exception is the VSEL instruction, which cannot take the PL code, so there the transform favours GE. Differential Revision: https://reviews.llvm.org/D64160 llvm-svn: 365117
* [ARM] MVE bitwise instruction patternsDavid Green2019-07-042-1/+25
| | | | | | | | | | | | This adds patterns for the simpler VAND, VORR and VEOR bitwise vector instructions. It also adjusts the top16Zero PatLeaf to not match on vector instructions, which can otherwise cause problems. Code written by David Sherwood. Differential Revision: https://reviews.llvm.org/D63867 llvm-svn: 365113
* [ARM] Fix for NDEBUG buildsSam Parker2019-07-031-4/+3
| | | | | | | | Fix unused variable warning as well as a nonsense assert. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 365046
* [ARM] Thumb2: favor R4-R7 over R12/LR in allocation order when opt for minsizeOliver Stannard2019-07-033-6/+55
| | | | | | | | | | | | | | | | | | | | For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow encoding. However, current allocation order is like: R0-R3, R12, LR, R4-R11 As a result, a lot of instructs that use R12/LR will be wide instrs. This patch changes the allocation order to: R0-R7, R12, LR, R8-R11 for thumb2 and -Osize. In most cases, there is no extra push/pop instrs as they will be folded into existing ones. There might be slight performance impact due to more stack usage, so we only enable it when opt for min size. https://reviews.llvm.org/D30324 llvm-svn: 365014
* [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)Roman Lebedev2019-07-032-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010
* [ARM] Fix unwind info for Thumb1 functions that save high registers.Eli Friedman2019-07-023-8/+20
| | | | | | | | | | | | | | | | | | | | | There were two issues here: one, some of the relevant instructions were missing the expected "FrameSetup" flag, and two, ARMAsmPrinter::EmitUnwindingInstruction wasn't expecting "mov" instructions in the prologue. I'm sticking the additional state into ARMFunctionInfo so it's obvious it only applies to the current function. I considered a few alternative approaches where we would compute the correct unwind information as part of the prologue/epilogue lowering, but it seems like a lot of work to introduce pseudo-instructions, and the current code seems to be reliable enough. Fixes https://bugs.llvm.org/show_bug.cgi?id=42408. Differential Revision: https://reviews.llvm.org/D63964 llvm-svn: 364970
* [ARM] MVE: allow soft-float ABI to pass vector types.Simon Tatham2019-07-023-2/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | Passing a vector type over the soft-float ABI involves it being split into four GPRs, so the first thing that has to happen at the start of the function is to recombine those into a vector register. The ABI types all vectors as v2f64, so we need to support BUILD_VECTOR for that type, which I do in this patch by allowing it to be expanded in terms of INSERT_VECTOR_ELT, and writing an ISel pattern for that in turn. Similarly, I provide a rule for EXTRACT_VECTOR_ELT so that a returned vector can be marshalled back into GPRs. While I'm here, I've also added ISD::UNDEF to the list of operations we turn back on in `setAllExpand`, because I noticed that otherwise it gets expanded into a BUILD_VECTOR with explicit zero inputs, leading to pointless machine instructions to zero out a vector register that's about to have every lane overwritten of in any case. Reviewers: dmgreen, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63937 llvm-svn: 364910
* [ARM] Stop using scalar FP instructions in integer-only MVE mode.Simon Tatham2019-07-023-18/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you compile with `-mattr=+mve` (enabling integer MVE instructions but not floating-point ones), then the scalar FP //registers// exist and it's legal to move things in and out of them, load and store them, but it's not legal to do arithmetic on them. In D60708, the calls to `addRegisterClass` in ARMISelLowering that enable use of the scalar FP registers became conditionalised on `Subtarget->hasFPRegs()` instead of `Subtarget->hasVFP2Base()`, so that loads, stores and moves of those registers would work. But I didn't realise that that would also enable all the operations on those types by default. Now, if the target doesn't have basic VFP, we follow up those `addRegisterClass` calls by turning back off all the nontrivial operations you can perform on f32 and f64. That causes several knock-on failures, which are fixed by allowing the `VMOVDcc` and `VMOVScc` instructions to be selected even if all you have is `HasFPRegs`, and adjusting several checks for 'is this a double in a single-precision-only world?' to the more general 'is this any FP type we can't do arithmetic on?'. Between those, the whole of the `float-ops.ll` and `fp16-instructions.ll` tests can now run in MVE-without-FP mode and generate correct-looking code. One odd side effect is that I had to relax the check lines in that test so that they permit test functions like `add_f` to be generated as tailcalls to software FP library functions, instead of ordinary calls. Doing that is entirely legal, but the mystery is why this is the first RUN line that's needed the relaxation: on the usual kind of non-FP target, no tailcalls ever seem to be generated. Going by the llc messages, I think `SoftenFloatResult` must be perturbing the code generation in some way, but that's as much as I can guess. Reviewers: dmgreen, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63938 llvm-svn: 364909
* [ARM] Fix MVE_VQxDMLxDH instruction classMikhail Maltsev2019-07-011-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: According to the ARMARM, the VQDMLADH, VQRDMLADH, VQDMLSDH and VQRDMLSDH instructions handle their results as follows: "The base variant writes the results into the lower element of each pair of elements in the destination register, whereas the exchange variant writes to the upper element in each pair". I.e., the initial content of the output register affects the result, as usual, we model this with an additional input. Also, for 32-bit variants Qd is not allowed to be the same register as Qm and Qn, we use @earlyclobber to indicate this. This patch also changes vpred_r to vpred_n because the instructions don't have an explicit 'inactive' operand. Reviewers: dmgreen, ostannard, simon_tatham Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64007 llvm-svn: 364796
* [ARM] MVE: support QQPRRegClass and QQQQPRRegClassMikhail Maltsev2019-07-011-2/+3
| | | | | | | | | | | | | | | | | | | Summary: QQPRRegClass and QQQQPRRegClass are used by the interleaving/deinterleaving loads/stores to represent sequences of consecutive SIMD registers. Reviewers: ostannard, simon_tatham, dmgreen Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64009 llvm-svn: 364794
* [ARM] WLS/LE Code GenerationSam Parker2019-07-017-28/+162
| | | | | | | | | | | | | | | | | Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 364733
* [ARM] Add support for the MVE long shift instructionsSam Tebbs2019-06-284-7/+85
| | | | | | | | | | | | MVE adds the lsll, lsrl and asrl instructions, which perform a shift on a 64 bit value separated into two 32 bit registers. The Expand64BitShift function is modified to accept ISD::SHL, ISD::SRL and ISD::SRA and convert it into the appropriate opcode in ARMISD. An SHL is converted into an lsll, an SRL is converted into an lsrl for the immediate form and a negation and lsll for the register form, and SRA is converted into an asrl. test/CodeGen/ARM/shift_parts.ll is added to test the logic of emitting these instructions. Differential Revision: https://reviews.llvm.org/D63430 llvm-svn: 364654
* [ARM] Add MVE mul patternsDavid Green2019-06-281-0/+16
| | | | | | | | | This simply adds integer and floating point VMUL patterns for MVE, same as we have add and sub. Differential Revision: https://reviews.llvm.org/D63866 llvm-svn: 364643
* [ARM] Mark math routines as non-legal for MVEDavid Green2019-06-281-0/+9
| | | | | | | | | This adds handling and tests for a number of floating point math routines, which have no MVE instructions. Differential Revision: https://reviews.llvm.org/D63725 llvm-svn: 364641
* [ARM] MVE patterns for VABS and VNEGDavid Green2019-06-281-0/+14
| | | | | | | | This simply adds the required patterns for fp neg and abs. Differential Revision: https://reviews.llvm.org/D63861 llvm-svn: 364640
* [ARM] Widening loads and narrowing storesDavid Green2019-06-283-4/+58
| | | | | | | | | | | | MVE has instructions to widen as it loads, and narrow as it stores. This adds the required patterns and legalisation to make them work including specifying that they are legal, patterns to select them and test changes. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63839 llvm-svn: 364636
* [ARM] Fix integer UB in MVE load/store immediate handling.Simon Tatham2019-06-282-6/+9
| | | | llvm-svn: 364635
* [ARM] MVE loads and storesDavid Green2019-06-282-11/+52
| | | | | | | | | | | | | | | This fills in the gaps for basic MVE loads and stores, allowing unaligned access and adding far too many tests. These will become important as narrowing/expanding and pre/post inc are added. Big endian might still not be handled very well, because we have not yet added bitcasts (and I'm not sure how we want it to work yet). I've included the alignment code anyway which maps with our current patterns. We plan to return to that later. Code written by Simon Tatham, with additional tests from Me and Mikhail Maltsev. Differential Revision: https://reviews.llvm.org/D63838 llvm-svn: 364633
* [ARM] Mark div and rem as expand for MVEDavid Green2019-06-281-0/+12
| | | | | | | | | We don't have vector operations for these, so they need to be expanded for both integer and float. Differential Revision: https://reviews.llvm.org/D63595 llvm-svn: 364631
* [ARM] Select MVE fp add and subDavid Green2019-06-281-0/+14
| | | | | | | | | | | The same as integer arithmetic, we can add simple floating point MVE addition and subtraction patterns. Initial code by David Sherwood Differential Revision: https://reviews.llvm.org/D63257 llvm-svn: 364629
* [ARM] Select MVE add and subDavid Green2019-06-281-0/+18
| | | | | | | | | | | This adds the first few patterns for MVE code generation, adding simple integer add and sub patterns. Initial code by David Sherwood Differential Revision: https://reviews.llvm.org/D63255 llvm-svn: 364627
* [ARM] MVE vector shufflesDavid Green2019-06-285-183/+355
| | | | | | | | | | | | | | | | | | This patch adds necessary shuffle vector and buildvector support for ARM MVE. It essentially adds support for VDUP, VREVs and some VMOVs, which are often required by other code (like upcoming patches). This mostly uses the same code from Neon that already generated NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and moved to ARMInstrInfo as they are common to both architectures. Most of the selection code seems to be applicable to both, but NEON does have some more instructions making some parts specific. Most code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63567 llvm-svn: 364626
* [ARM] Fix formatting issue in ARMISelLowering.cppSam Tebbs2019-06-271-1/+2
| | | | | | | Fix a formatting error in ARMISelLowering.cpp::Expand64BitShift. My test commit after receiving write access. llvm-svn: 364560
* [ARM] Fix bogus assertions in copyPhysReg v8.1-M cases.Simon Tatham2019-06-271-4/+4
| | | | | | | | | | | | | | | | The code to generate register move instructions in and out of VPR and FPSCR_NZCV had assertions checking that the other register involved was a GPR _pair_, instead of a single GPR as it should have been. Reviewers: miyuki, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63865 llvm-svn: 364534
* [ARM] Fix handling of zero offsets in LOB instructions.Simon Tatham2019-06-272-16/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The BF and WLS/WLSTP instructions have various branch-offset fields occupying different positions and lengths in the instruction encoding, and all of them were decoded at disassembly time by the function DecodeBFLabelOffset() which returned SoftFail if the offset was zero. In fact, it's perfectly fine and not even a SoftFail for most of those offset fields to be zero. The only one that can't be zero is the 4-bit field labelled `boff` in the architecture spec, occupying bits {26-23} of the BF instruction family. If that one is zero, the encoding overlaps other instructions (WLS, DLS, LETP, VCTP), so it ought to be a full Fail. Fixed by adding an extra template parameter to DecodeBFLabelOffset which controls whether a zero offset is accepted or rejected. Adjusted existing tests (only in error messages for bad disassemblies); added extra tests to demonstrate zero offsets being accepted in all the right places, and a few demonstrating rejection of zero `boff`. Reviewers: DavidSpickett, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63864 llvm-svn: 364533
* [ARM] Make coprocessor number restrictions consistent.Simon Tatham2019-06-273-10/+25
| | | | | | | | | | | | | | | | | | | | | | Different versions of the Arm architecture disallow the use of generic coprocessor instructions like MCR and CDP on different sets of coprocessors. This commit centralises the check of the coprocessor number so that it's consistent between assembly and disassembly, and also updates it for the new restrictions in Arm v8.1-M. New tests added that check all the coprocessor numbers; old tests updated, where they used a number that's now become illegal in the context in question. Reviewers: DavidSpickett, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63863 llvm-svn: 364532
* [ARM] Tighten restrictions on use of SP in v8.1-M CSEL.Simon Tatham2019-06-273-8/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | In the `CSEL Rd,Rm,Rn` instruction family (also including CSINC, CSINV and CSNEG), the architecture lists it as CONSTRAINED UNPREDICTABLE (i.e. SoftFail) to use SP in the Rd or Rm slot, but outright illegal to use it in the Rn slot, not least because some encodings of that form are used by MVE instructions such as UQRSHLL. MC was treating all three slots the same, as SoftFail. So the only reason UQRSHLL was disassembled correctly at all was because the MVE decode table is separate from the Thumb2 one and takes priority; if you turned off MVE, then encodings such as `[0x5f,0xea,0x0d,0x83]` would disassemble as spurious CSELs. Fixed by inventing another version of the `GPRwithZR` register class, which disallows SP completely instead of just SoftFailing it. Reviewers: DavidSpickett, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63862 llvm-svn: 364531
* [GlobalISel] Accept multiple vregs for lowerCall's argsDiana Picus2019-06-271-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerCall to accept several virtual registers for each argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63551 llvm-svn: 364512
* [GlobalISel] Accept multiple vregs for lowerCall's resultDiana Picus2019-06-272-23/+9
| | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerCall to accept several virtual registers for the call result, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63550 llvm-svn: 364511
* [GlobalISel] Accept multiple vregs in lowerFormalArgsDiana Picus2019-06-272-17/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510
* [GlobalISel] Allow multiple VRegs in ArgInfo. NFCDiana Picus2019-06-271-7/+15
| | | | | | | | | | | Allow CallLowering::ArgInfo to contain more than one virtual register. This is useful when passes split aggregates into several virtual registers, but need to also provide information about the original type to the call lowering. Used in follow-up patches. Differential Revision: https://reviews.llvm.org/D63548 llvm-svn: 364509
* [ARM] Don't reserve R12 on Thumb1 as an emergency spill slot.Eli Friedman2019-06-267-120/+217
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation of ThumbRegisterInfo::saveScavengerRegister is bad for two reasons: one, it's buggy, and two, it blocks using R12 for other optimizations. So this patch gets rid of it, and adds the necessary support for using an ordinary emergency spill slot on Thumb1. (Specifically, I think saveScavengerRegister was broken by r305625, and nobody noticed for two years because the codepath is almost never used. The new code will also probably not be used much, but it now has better tests, and if we fail to emit a necessary emergency spill slot we get a reasonable error message instead of a miscompile.) A rough outline of the changes in the patch: 1. Gets rid of ThumbRegisterInfo::saveScavengerRegister. 2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an emergency spill slot for Thumb1. 3. Implements useFPForScavengingIndex, so the emergency spill slot isn't placed at a negative offset from FP on Thumb1. 4. Modifies the heuristics for allocating an emergency spill slot to support Thumb1. This includes fixing ExtraCSSpill so we don't try to use "lr" as a substitute for allocating an emergency spill slot. 5. Allocates a base pointer in more cases, so the emergency spill slot is always accessible. 6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the right offset in the new cases where we're forcing a base pointer. 7. Ensures we never generate a load or store with an offset outside of its frame object. This makes the heuristics more straightforward. 8. Changes Thumb1 prologue and epilogue emission so it never uses register scavenging. Some of the changes to the emergency spill slot heuristics in determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow the compiler to avoid allocating an emergency spill slot in cases where it isn't necessary. The rest of the changes should only affect Thumb1. Differential Revision: https://reviews.llvm.org/D63677 llvm-svn: 364490
* [ARM] Handle fixup_arm_pcrel_9 correctly on big-endian targetsMikhail Maltsev2019-06-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: The getFixupKindContainerSizeBytes function returns the size of the instruction containing a given fixup. Currently fixup_arm_pcrel_9 is not handled in this function, this causes an assertion failure in the debug build and incorrect codegen in the release build. This patch fixes the problem. Reviewers: ostannard, simon_tatham Reviewed By: ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63778 llvm-svn: 364404
* [ARM] Fix -Wimplicit-fallthrough after D60709/r364331Fangrui Song2019-06-261-4/+3
| | | | llvm-svn: 364376
* [ARM] Support inline assembler constraints for MVE.Simon Tatham2019-06-251-1/+22
| | | | | | | | | | | | | | | | | | | | | "To" selects an odd-numbered GPR, and "Te" an even one. There are some 8.1-M instructions that have one too few bits in their register fields and require registers of particular parity, without necessarily using a consecutive even/odd pair. Also, the constraint letter "t" should select an MVE q-register, when MVE is present. This didn't need any source changes, but some extra tests have been added. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60709 llvm-svn: 364331
* [ARM] Code-generation infrastructure for MVE.Simon Tatham2019-06-257-17/+306
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides the low-level support to start using MVE vector types in LLVM IR, loading and storing them, passing them to __asm__ statements containing hand-written MVE vector instructions, and *if* you have the hard-float ABI turned on, using them as function parameters. (In the soft-float ABI, vector types are passed in integer registers, and combining all those 32-bit integers into a q-reg requires support for selection DAG nodes like insert_vector_elt and build_vector which aren't implemented yet for MVE. In fact I've also had to add `arm_aapcs_vfpcc` to a couple of existing tests to avoid that problem.) Specifically, this commit adds support for: * spills, reloads and register moves for MVE vector registers * ditto for the VPT predication mask that lives in VPR.P0 * make all the MVE vector types legal in ISel, and provide selection DAG patterns for BITCAST, LOAD and STORE * make loads and stores of scalar FP types conditional on `hasFPRegs()` rather than `hasVFP2Base()`. As a result a few existing tests needed their llc command lines updating to use `-mattr=-fpregs` as their method of turning off all hardware FP support. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60708 llvm-svn: 364329
* [ARM] Fix for DLS/LE CodeGenSam Parker2019-06-251-8/+9
| | | | | | | | | The expensive buildbots highlighted the mir tests were broken, which I've now updated and added --verify-machineinstrs to them. This also uncovered a couple of bugs in the backend pass, so these have also been fixed. llvm-svn: 364323
* [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D60692Fangrui Song2019-06-251-0/+1
| | | | llvm-svn: 364312
* [ARM] Fix buildbot failure due to -Werror.Simon Tatham2019-06-251-1/+0
| | | | | | | | Including both 'case ARM_AM::uxtw' and 'default' in the getShiftOp switch caused a buildbot to fail with error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] llvm-svn: 364300
* [ARM] MVE VPT BlocksSjoerd Meijer2019-06-251-4/+12
| | | | | | | | | | | | A minor iteration on the MVE VPT Block pass to enable more efficient VPT Block code generation: consecutive VPT predicated statements, predicated on the same condition, will be placed within the same VPT Block. This essentially is also an exercise to write some more tests for the next step, which should be more generic also merging instructions when they are not consecutive. Differential Revision: https://reviews.llvm.org/D63711 llvm-svn: 364298
* [ARM] Explicit lowering of half <-> double conversions.Simon Tatham2019-06-252-15/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an FP_EXTEND or FP_ROUND isel dag node converts directly between f16 and f32 when the target CPU has no instruction to do it in one go, it has to be done in two steps instead, going via f32. Previously, this was done implicitly, because all such CPUs had the storage-only implementation of f16 (i.e. the only thing you can do with one at all is to convert it to/from f32). So isel would legalize the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp node (or vice versa), and then the fp_extend would already be f32->f64 rather than f16->f64. But that technique can't support a target CPU which has full f16 support but _not_ f64, such as some variants of Arm v8.1-M. So now we provide custom lowering for FP_EXTEND and FP_ROUND, which checks support for f16 and f64 and decides on the best thing to do given the combination of flags it gets back. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60692 llvm-svn: 364294
* [ARM] Add remaining miscellaneous MVE instructions.Simon Tatham2019-06-254-22/+164
| | | | | | | | | | | | | | | | | | This final batch includes the tail-predicated versions of the low-overhead loop instructions (LETP); the VPSEL instruction to select between two vector registers based on the predicate mask without having to open a VPT block; and VPNOT which complements the predicate mask in place. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62681 llvm-svn: 364292
* [ARM] Add MVE vector load/store instructions.Simon Tatham2019-06-2513-60/+1049
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the rest of the vector memory access instructions. It includes contiguous loads/stores, with an ordinary addressing mode such as [r0,#offset] (plus writeback variants); gather loads and scatter stores with a scalar base address register and a vector of offsets from it (written [r0,q1] or similar); and gather/scatters with a vector of base addresses (written [q0,#offset], again with writeback). Additionally, some of the loads can widen each loaded value into a larger vector lane, and the corresponding stores narrow them again. To implement these, we also have to add the addressing modes they need. Also, in AsmParser, the `isMem` query function now has subqueries `isGPRMem` and `isMVEMem`, according to which kind of base register is used by a given memory access operand. I've also had to add an extra check in `checkTargetMatchPredicate` in the AsmParser, without which our last-minute check of `rGPR` register operands against SP and PC was failing an assertion because Tablegen had inserted an immediate 0 in place of one of a pair of tied register operands. (This matches the way the corresponding check for `MCK_rGPR` in `validateTargetOperandClass` is guarded.) Apparently the MVE load instructions were the first to have ever triggered this assertion, but I think only because they were the first to have a combination of the usual Arm pre/post writeback system and the `rGPR` class in particular. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62680 llvm-svn: 364291
* [ARM] DLS/LE low-overhead loop code generationSam Parker2019-06-256-1/+349
| | | | | | | | | | | | | | | | | Introduce three pseudo instructions to be used during DAG ISel to represent v8.1-m low-overhead loops. One maps to set_loop_iterations while loop_decrement_reg is lowered to two, so that we can separate the decrement and branching operations. The pseudo instructions are expanded pre-emission, where we can still decide whether we actually want to generate a low-overhead loop, in a new pass: ARMLowOverheadLoops. The pass currently bails, reverting to an sub, icmp and br, in the cases where a call or stack spill/restore happens between the decrement and branching instructions, or if the loop is too large. Differential Revision: https://reviews.llvm.org/D63476 llvm-svn: 364288
OpenPOWER on IntegriCloud