summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Add MVE vector bit-operations (register inputs).Simon Tatham2019-06-198-25/+477
| | | | | | | | | | | | | | | | | | | | | | | | This includes all the obvious bitwise operations (AND, OR, BIC, ORN, MVN) in register-to-register forms, and the immediate forms of AND/OR/BIC/ORN; byte-order reverse instructions; and the VMOVs that access a single lane of a vector. Some of those VMOVs (specifically, the ones that access a 32-bit lane) share an encoding with existing instructions that were disassembled as accessing half of a d-register (e.g. `vmov.32 r0, d1[0]`), but in 8.1-M they're now written as accessing a quarter of a q-register (e.g. `vmov.32 r0, q0[2]`). The older syntax is still accepted by the assembler. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62673 llvm-svn: 363838
* [NFC] move some hardware loop checking code to a common place for other using.Chen Zheng2019-06-192-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D63478 llvm-svn: 363758
* [ARM] Comply with rules on ARMv8-A thumb mode partial deprecation of IT.Huihui Zhang2019-06-182-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When identifing instructions that can be folded into a MOVCC instruction, checking for a predicate operand is not enough, also need to check for thumb2 function, with restrict-IT, is the machine instruction eligible for ARMv8 IT or not. Notes in ARMv8-A Architecture Reference Manual, section "Partial deprecation of IT" https://usermanual.wiki/Pdf/ARM20Architecture20Reference20ManualARMv8.1667877052.pdf "ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to instructions other than a single subsequent 16-bit instruction from a restricted set are deprecated, as are explicit references to the PC within that single 16-bit instruction. This permits the non-deprecated forms of IT and subsequent instructions to be treated as a single 32-bit conditional instruction." Reviewers: efriedma, lebedev.ri, t.p.northover, jmolloy, aemerson, compnerd, stoklund, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63474 llvm-svn: 363739
* [ARM] Add MVE vector shift instructions.Simon Tatham2019-06-184-4/+655
| | | | | | | | | | | | | | | | | | | This includes saturating and non-saturating shifts, both with immediate shift count and with the shift counts given by another vector register; VSHLC (in which the bits shifted out of each active vector lane are shifted in to the next active lane); and also VMOVL, which is enough like an immediate shift that it didn't fit too badly in this category. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62672 llvm-svn: 363696
* [ARM] Add MVE integer vector min/max instructions.Simon Tatham2019-06-182-1/+28
| | | | | | | | | | | | | | | | | | | | | Summary: These form a small family of their own, to go with the floating-point VMINNM/VMAXNM instructions added in a previous commit. They introduce the first of many special cases in the mnemonic recognition code, because VMIN with the E suffix used by the VPT predication system needs to avoid being interpreted as the nonexistent instruction 'VMI' with an ordinary 'NE' condition suffix. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62671 llvm-svn: 363695
* [ARM] Rename MVE instructions in Tablegen for consistency.Simon Tatham2019-06-185-243/+256
| | | | | | | | | | | | | | | | | | | Summary: Their names began with a mishmash of `MVE_`, `t2` and no prefix at all. Now they all start with `MVE_`, which seems like a reasonable choice on the grounds that (a) NEON is the thing they're most at risk of being confused with, and (b) MVE implies Thumb-2, so a prefix indicating MVE is strictly more specific than one indicating Thumb-2. Reviewers: ostannard, SjoerdMeijer, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63492 llvm-svn: 363690
* [ARM] Some Thumb2ITBlock clean ups. NFCSjoerd Meijer2019-06-182-48/+41
| | | | | | | | | Some more refactoring, like registering the IT Block pass, less cryptic variable names, and some simplification of loops. Differential Revision: https://reviews.llvm.org/D63419 llvm-svn: 363666
* [CodeGen] Check for HardwareLoop Latch ExitBlockSam Parker2019-06-171-4/+0
| | | | | | | | | | | | The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge. Differential Revision: https://reviews.llvm.org/D63336 llvm-svn: 363556
* [ARM] Fix another -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds ↵Fangrui Song2019-06-171-1/+1
| | | | | | after D63265 llvm-svn: 363535
* [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265Fangrui Song2019-06-171-1/+1
| | | | llvm-svn: 363534
* [ARM] Remove ARMComputeBlockSizeSam Parker2019-06-171-80/+0
| | | | | | Forgot to remove file! llvm-svn: 363532
* [ARM] Add ARMBasicBlockInfo.cppSam Parker2019-06-171-0/+146
| | | | | | Forgot to add file! llvm-svn: 363531
* [ARM] Extract some code from ARMConstantIslandPassSam Parker2019-06-174-112/+101
| | | | | | | | | Create the ARMBasicBlockUtils class for tracking and querying basic blocks sizes so we can use them when generating low-overhead loops. Differential Revision: https://reviews.llvm.org/D63265 llvm-svn: 363530
* [ARM] Add MVE horizontal accumulation instructionsMikhail Maltsev2019-06-142-0/+318
| | | | | | | | | This is the family of vector instructions that combine all the lanes in their input vector(s), and output a value in one or two GPRs. Differential Revision: https://reviews.llvm.org/D62670 llvm-svn: 363403
* [ARM] MVE VPT Block PassSjoerd Meijer2019-06-145-0/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initial commit of a new pass to create vector predication blocks, called VPT blocks, that are supported by the Armv8.1-M MVE architecture. This is a first naive implementation. I.e., for 2 consecutive predicated instructions I1 and I2, for example, it will generate 2 VPT blocks: VPST I1 VPST I2 A more optimal implementation would obviously put instructions in the same VPT block when they are predicated on the same condition and when it is allowed to do this: VPTT I1 I2 We will address this optimisation with follow up patches when the groundwork is in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis that we need for the more optimal implementation. VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes cannot interact with each other. Instructions allowed in VPT blocks must be MVE instructions that are marked as VPT compatible. Differential Revision: https://reviews.llvm.org/D63247 llvm-svn: 363370
* [ARM] Set up infrastructure for MVE vector instructions.Simon Tatham2019-06-1312-94/+1306
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit prepares the way to start adding the main collection of MVE instructions, which operate on the 128-bit vector registers. The most obvious thing that's needed, and the simplest, is to add the MQPR register class, which is like the existing QPR except that it has fewer registers in it. The more complicated part: MVE defines a system of vector predication, in which instructions operating on 128-bit vector registers can be constrained to operate on only a subset of the lanes, using a system of prefix instructions similar to the existing Thumb IT, in that you have one prefix instruction which designates up to 4 following instructions as subject to predication, and within that sequence, the predicate can be inverted by means of T/E suffixes ('Then' / 'Else'). To support instructions of this type, we've added two new Tablegen classes `vpred_n` and `vpred_r` for standard clusters of MC operands to add to a predicated instruction. Both include a flag indicating how the instruction is predicated at all (options are T, E and 'not predicated'), and an input register field for the register controlling the set of active lanes. They differ from each other in that `vpred_r` also includes an input operand for the previous value of the output register, for instructions that leave inactive lanes unchanged. `vpred_n` lacks that extra operand; it will be used for instructions that don't preserve inactive lanes in their output register (either because inactive lanes are zeroed, as the MVE load instructions do, or because the output register isn't a vector at all). This commit also adds the family of prefix instructions themselves (VPT / VPST), and all the machinery needed to work with them in assembly and disassembly (e.g. generating the 't' and 'e' mnemonic suffixes on disassembled instructions within a predicated block) I've added a couple of demo instructions that derive from the new Tablegen base classes and use those two operand clusters. The bulk of the vector instructions will come in followup commits small enough to be manageable. (One exception is that I've added the full version of `isMnemonicVPTPredicable` in the AsmParser, because it seemed pointless to carefully split it up.) Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62669 llvm-svn: 363258
* [ARM] Refactor handling of IT mask operands.Simon Tatham2019-06-136-64/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During assembly, the mask operand to an IT instruction (storing the sequence of T/E for 'Then' and 'Else') is parsed out of the mnemonic into a representation that encodes 'Then' and 'Else' in the same way regardless of the condition code. At some point during encoding it has to be converted into the instruction encoding used in the architecture, in which the mask encodes a sequence of replacement low-order bits for the condition code, so that which bit value means 'then' and which 'else' depends on whether the original condition code had its low bit set. Previously, that transformation was done by processInstruction(), half way through assembly. So an MCOperand storing an IT mask would sometimes store it in one format, and sometimes in the other, depending on where in the assembly pipeline you were. You can see this in diagnostics from `llvm-mc -debug -triple=thumbv8a -show-inst`, for example: if you give it an instruction such as `itete eq`, you'd see an `<MCOperand Imm:5>` in a diagnostic become `<MCOperand Imm:11>` in the final output. Having the same data structure store values with time-dependent semantics is confusing already, and it will get more confusing when we introduce the MVE VPT instruction which reuses the Then/Else bitmask idea in a different context. So I'm refactoring: now, all `ARMOperand` and `MCOperand` representations of an IT mask work exactly the same way, namely, 0 means 'Then' and 1 means 'Else', regardless of what original predicate is being referred to. The architectural encoding of IT that depends on the original condition is now constructed at the point when we turn the `MCOperand` into the final instruction bit pattern, and decoded similarly in the disassembler. The previous condition-independent parse-time format used 0 for Else and 1 for Then. I've taken the opportunity to flip the sense of it while I'm changing all of this anyway, because it seems to me more natural to use 0 for 'leave the starting condition unchanged' and 1 for 'invert it', as if those bits were an XOR mask. Reviewers: ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63219 llvm-svn: 363244
* [NFC] Simplify Call querySam Parker2019-06-131-1/+1
| | | | | | Use getIntrinsicID() directly from IntrinsicInst. llvm-svn: 363235
* [ARM][TTI] Scan for existing loop intrinsicsSam Parker2019-06-131-5/+33
| | | | | | | | | TTI should report that it's not profitable to generate a hardware loop if it, or one of its child loops, has already been converted. Differential Revision: https://reviews.llvm.org/D63212 llvm-svn: 363234
* [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests ↵Simon Pilgrim2019-06-122-4/+8
| | | | | | | | | | | | | | (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179
* [ARM] Fix compiler warningMikael Holmen2019-06-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Without this fix clang 3.6 complains with: ../lib/Target/ARM/ARMAsmPrinter.cpp:1473:18: error: variable 'BranchTarget' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] } else if (MI->getOperand(1).isSymbol()) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../lib/Target/ARM/ARMAsmPrinter.cpp:1479:22: note: uninitialized use occurs here MCInst.addExpr(BranchTarget); ^~~~~~~~~~~~ ../lib/Target/ARM/ARMAsmPrinter.cpp:1473:14: note: remove the 'if' if its condition is always true } else if (MI->getOperand(1).isSymbol()) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../lib/Target/ARM/ARMAsmPrinter.cpp:1465:33: note: initialize the variable 'BranchTarget' to silence this warning const MCExpr *BranchTarget; ^ = nullptr 1 error generated. Discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190610/661417.html llvm-svn: 363166
* [ARM] Implement TTI::isHardwareLoopProfitableSam Parker2019-06-122-0/+200
| | | | | | | | | | | | | | | | | | | | Implement the backend target hook to drive the HardwareLoops pass. The low-overhead branch extension for Arm M-class cores is flexible enough that we don't have to ensure correctness at this point, except checking that the loop counter variable can be stored in LR - a 32-bit register. For it to be profitable, we want to avoid loops that contain function calls, or any other instruction that alters the PC. This implementation uses TargetLoweringInfo, to query type and operation actions, looks at intrinsic calls and also performs some manual checks for remainder/division and FP operations. I think this should be a good base to start and extra details can be filled out later. Differential Revision: https://reviews.llvm.org/D62907 llvm-svn: 363149
* [ARM] First MVE instructions: scalar shifts.Mikhail Maltsev2019-06-117-0/+294
| | | | | | | | | | | | | | | | | This introduces a new decoding table for MVE instructions, and starts by adding the family of scalar shift instructions that are part of the MVE architecture extension: saturating shifts within a single GPR, and long shifts across a pair of GPRs (both saturating and normal). Some of these shift instructions have only 3-bit register fields in the encoding, with the low bit fixed. So they can only address an odd or even numbered GPR (depending on the operand), and therefore I add two new register classes, GPREven and GPROdd. Differential Revision: https://reviews.llvm.org/D62668 Change-Id: Iad95d5f83d26aef70c674027a184a6b1e0098d33 llvm-svn: 363051
* [ARM] Fix unused-variable warning in rL363039.Simon Tatham2019-06-111-0/+1
| | | | | | | | The variable `OffsetMask` is currently only used in an assertion, so if assertions are compiled out and -Werror is enabled, it becomes a build failure. llvm-svn: 363043
* [ARM] Add the non-MVE instructions in Arm v8.1-M.Simon Tatham2019-06-1123-90/+1587
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need a new addressing mode. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 363039
* Revert CMake: Make most target symbols hidden by defaultTom Stellard2019-06-116-6/+6
| | | | | | | | | | | | | | | This reverts r362990 (git commit 374571301dc8e9bc9fdd1d70f86015de198673bd) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028
* CMake: Make most target symbols hidden by defaultTom Stellard2019-06-106-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 36221 nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990
* Revert rL362953 and its followup rL362955.Simon Tatham2019-06-1020-1588/+90
| | | | | | | | These caused a build failure because I managed not to notice they depended on a later unpushed commit in my current stack. Sorry about that. llvm-svn: 362956
* [ARM] Add the non-MVE instructions in Arm v8.1-M.Simon Tatham2019-06-101-17/+7
| | | | | | | | This should have been part of r362953, but I had a finger-trouble incident and committed the old rather than new version of the patch. Sorry. llvm-svn: 362955
* [ARM] Add the non-MVE instructions in Arm v8.1-M.Simon Tatham2019-06-1020-90/+1598
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need some new addressing modes. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 362953
* [ARM] Disallow PC, and optionally SP, in VMOVRH and VMOVHR.Simon Tatham2019-06-101-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | Arm v8.1-M supports the VMOV instructions that move a half-precision value to and from a GPR, but not if the GPR is SP or PC. To fix this, I've changed those instructions to use the rGPR register class instead of GPR. rGPR always excludes PC, and it excludes SP except in the presence of the HasV8Ops target feature (i.e. Arm v8-A). So the effect is that VMOV.F16 to and from PC is now illegal everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A cores (which I believe is all as it should be). Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60704 llvm-svn: 362942
* [ARM] Enable Unroll UpperBoundDavid Green2019-06-101-0/+1
| | | | | | | | | | | This option allows loops with small max trip counts to be fully unrolled. This can help with code like the remainder loops from manually unrolled loops like those that appear in the cmsis dsp library. We would apparently previously runtime unroll them with the default unroll count (4). Differential Revision: https://reviews.llvm.org/D63064 llvm-svn: 362928
* [ARM] Adjust isLegalT1AddressImmediate for non-legal typesDavid Green2019-06-081-2/+2
| | | | | | | | | | | Types such as float and i64's do not have legal loads in Thumb1, but will still be loaded with a LDR (or potentially multiple LDR's). As such we can treat the cost of addressing mode calculations the same as an i32 and get some optimisation benefits. Differential Revision: https://reviews.llvm.org/D62968 llvm-svn: 362874
* [ARM] Add MVE addressing to isLegalT2AddressImmediateDavid Green2019-06-081-1/+20
| | | | | | | | | | Now with MVE being added, we can add the vector addressing mode costs for it. These are generally imm7 multiplied by the size of the type being loaded / stored. Differential Revision: https://reviews.llvm.org/D62967 llvm-svn: 362873
* [ARM] Add fp16 addressing to isLegalT2AddressImmediateDavid Green2019-06-081-0/+3
| | | | | | | | | | The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs to account for those, and adds extra testing. It is dependant upon hasFPRegs16 as this is what the load/store instructions require. Differential Revision: https://reviews.llvm.org/D62966 llvm-svn: 362872
* [ARM] Add HasNEON for all Neon patterns in ARMInstrNEON.td. NFCIDavid Green2019-06-081-78/+177
| | | | | | | | | | | | | | | We are starting to add an entirely separate vector architecture to the ARM backend. To do that we need at least some separation between the existing NEON and the new MVE code. This patch just goes through the Neon patterns and ensures that they are predicated on HasNEON, giving MVE a stable place to start from. No tests yet as this is largely an NFC, and we don't have the other target that will treat any of these intructions as legal. Differential Revision: https://reviews.llvm.org/D62945 llvm-svn: 362870
* [ARM] Add FP16 vector insert/extract patternsMikhail Maltsev2019-06-041-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds two FP16 extraction and two insertion patterns (one per possible vector length). Extractions are handled by copying a Q/D register into one of VFP2 class registers, where single FP32 sub-registers can be accessed. Then the extraction of even lanes are simple sub-register extractions (because we don't care about the top parts of registers for FP16 operations). Odd lanes need an additional VMOVX instruction. Unfortunately, insertions cannot be handled in the same way, because: * There is no instruction to insert FP16 into an even lane (VINS only works with odd lanes) * The patterns for odd lanes will have a form of a DAG (not a tree), and will not be implementable in pure tablegen Because of this insertions are handled in the same way as 16-bit integer insertions (with conversions between FP registers and GPRs using VMOVHR instructions). Without these patterns the ARM backend would sometimes fail during instruction selection. This patch also adds patterns which combine: * an FP16 element extraction and a store into a single VST1 instruction * an FP16 load and insertion into a single VLD1 instruction Differential Revision: https://reviews.llvm.org/D62651 llvm-svn: 362482
* [ARM] Turn some undefined encoding bits into 0s.Simon Tatham2019-06-041-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | The family of 32-bit Thumb instruction encodings that include t2ORR, t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15. The Tablegen descriptions of those instructions listed them as ?. This change tightens that up by making them into 0 + Unpredictable. In the specific case of t2ORR, we tighten it up still further by making the zero bit mandatory. This change comes from Arm v8.1-M, in which encodings with that bit equal to 1 will now be used for different instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma Reviewed By: dmgreen, efriedma Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60705 llvm-svn: 362470
* [ARM][FIX] Ran out of registers due tail recursionDiogo N. Sampaio2019-06-032-49/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - pr42062 When compiling for MinSize, ARMTargetLowering::LowerCall decides to indirect multiple calls to a same function. However, it disconsiders the limitation that thumb1 indirect calls require the callee to be in a register from r0 to r3 (llvm limiation). If all those registers are used by arguments, the compiler dies with "error: run out of registers during register allocation". This patch tells the function IsEligibleForTailCallOptimization if we intend to perform indirect calls, as to avoid tail call optimization. Reviewers: dmgreen, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62683 llvm-svn: 362366
* [NFC][ARM][ParallelDSP] Refactor narrow sequenceSam Parker2019-05-301-48/+19
| | | | | | | Most of the code used for finding a 'narrow' sequence is not used, so I've removed it and simplified the calls from the smlad matcher. llvm-svn: 362104
* [ARM] Change the MC names for VMAXNM/VMINNMSjoerd Meijer2019-05-303-35/+36
| | | | | | | | | | | | | Now the NEON ones have a prefix "NEON_", and the VFP ones have a prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be made to match both of _those_ classes of VMAXNM without also matching the MVE ones that are going to be introduced soon. NFCI. Patch by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60700 llvm-svn: 362097
* [ARM] LowerVECTOR_SHUFFLE - fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-05-301-4/+4
| | | | llvm-svn: 362094
* [ARM] add target arch definitions for 8.1-M and MVESjoerd Meijer2019-05-304-2/+52
| | | | | | | | | | | | | | | | | This adds: - LLVM subtarget features to make all the new instructions conditional on, - CPU and FPU names for use on clang's command line, with default FPUs set so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right FPU features, - architecture extension names "mve" and "mve.fp", - ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE (a new actual tag). Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60698 llvm-svn: 362090
* [ARM] Introduce separate features for FP registersSjoerd Meijer2019-05-305-18/+69
| | | | | | | | | | | | | | | | | The MVE extension in Arm v8.1-M permits the use of some move, load and store isntructions which access the FP registers, even if there's no actual FP support in the processor (in particular, if you have the integer-only version of MVE). Therefore, we need separate subtarget features to condition those instructions on, which are implied by both FP and MVE but are not part of either. Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60694 llvm-svn: 362088
* [ARM] Add an MVE execution domainSjoerd Meijer2019-05-302-6/+8
| | | | | | | | | | | | | | | | | | | | | | MVE architecturally specifies a 'beat' system in which a vector instruction executed now will complete its actual operation over the next four cycles, so it can overlap with the execution of the previous and next MVE instruction. This makes it generally an advantage to avoid moving values back and forth between MVE registers and anywhere else, if there's any sensible way to do the same processing in whatever register type the values already occupied. That's just what the 'execution domain' system is supposed to achieve. So here we add a new execution domain which will contain all the MVE vector instructions when they are added. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60703 llvm-svn: 362068
* [ARM] Split predicates out into their own .td fileSjoerd Meijer2019-05-293-184/+189
| | | | | | | | | | | | | | | | The new ARMPredicates.td is included from ARM.td, early enough that the predicate definitions are already in scope when ARMSchedule.td is included. This will make it possible to refer to them in UnsupportedFeatures fields of scheduling models. NFC: the chunk of Tablegen being moved here is copied and pasted verbatim. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60693 llvm-svn: 361958
* [ARM] Replace fp-only-sp and d16 with fp64 and d32.Simon Tatham2019-05-2816-175/+203
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Those two subtarget features were awkward because their semantics are reversed: each one indicates the _lack_ of support for something in the architecture, rather than the presence. As a consequence, you don't get the behavior you want if you combine two sets of feature bits. Each SubtargetFeature for an FP architecture version now comes in four versions, one for each combination of those options. So you can still say (for example) '+vfp2' in a feature string and it will mean what it's always meant, but there's a new string '+vfp2d16sp' meaning the version without those extra options. A lot of this change is just mechanically replacing positive checks for the old features with negative checks for the new ones. But one more interesting change is that I've rearranged getFPUFeatures() so that the main FPU feature is appended to the output list *before* rather than after the features derived from the Restriction field, so that -fp64 and -d32 can override defaults added by the main feature. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60691 llvm-svn: 361845
* [ARM GlobalISel] Cleanup CallLowering a bitDiana Picus2019-05-272-22/+13
| | | | | | | We never actually use the Offsets produced by ComputeValueVTs, so remove them until we need them. llvm-svn: 361755
* [AMDGPU] Divergence driven ISel. Assign register class for cross block ↵Alexander Timofeev2019-05-262-2/+5
| | | | | | | | | | | | | | | | | | values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741
* [ARM] Select fp16 fmaDavid Green2019-05-261-0/+3
| | | | | | | | This adds a pattern for fma, similar to the float and double patterns. Differential Revision: https://reviews.llvm.org/D62330 llvm-svn: 361719
OpenPOWER on IntegriCloud