summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Only produce qadd8b under hasV6OpsDavid Green2020-05-191-1/+1
| | | | | | | | | | | | | | When compiling for a arm5te cpu from clang, the +dsp attribute is set. This meant we could try and generate qadd8 instructions where we would end up having no pattern. I've changed the condition here to be hasV6Ops && hasDSP, which is what other parts of ARMISelLowering seem to use for similar instructions. Fixed PR45677. Differential Revision: https://reviews.llvm.org/D78877 (cherry picked from commit 8807139026b64ac40163bb255dad38a1d8054f08)
* Don't generate libcalls for wide shift on Windows ARM (PR42711)Hans Wennborg2020-02-251-1/+1
| | | | | | | The previous patch (cff90f07cb5cc3c3bc58277926103af31caef308) didn't cover ARM. (cherry picked from commit decd021facba804b57e8d80b6159c987d3261ab8)
* [FPEnv][ARM] Don't call mutateStrictFPToFP when loweringJohn Brawn2020-02-181-2/+10
| | | | | | | | | | | | mutateStrictFPToFP can delete the node and replace it with another with the same value which can later cause problems, and returning the result of mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the returned value has the same number of results as the original. Instead handle things by doing the mutation manually. Differential Revision: https://reviews.llvm.org/D74726 (cherry picked from commit 594a89f7270da74c89f2321432bc6a7135773fa5)
* [ARM] Fix infinite loop when lowering STRICT_FP_EXTENDJohn Brawn2020-02-181-0/+9
| | | | | | | | | | | | | | | | If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the current implementation will cause in infinite loop for STRICT_FP_EXTEND due to emitting a merge_values of the original node which after replacement becomes a merge_values of itself. Fix this by not doing anything for f32 to f64 extend when we have FP64, though for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that doesn't happen automatically for opcodes with custom lowering. Differential Revision: https://reviews.llvm.org/D74559 (cherry picked from commit 0ec57972967dfb43fc022c2e3788be041d1db730)
* [FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCSJohn Brawn2020-02-181-3/+63
| | | | | | | | | | | | | | | These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194 (cherry picked from commit b37d59353f699e99f139a9227a6a69964ef4b132)
* Revert "[ARM] Improve codegen of volatile load/store of i64"Victor Campos2020-02-081-55/+2
| | | | This reverts commit 60e0120c913dd1d4bfe33769e1f000a076249a42.
* [TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from ↵Craig Topper2020-01-101-4/+0
| | | | | | | | | | | | | | | | | | | RunttimeLibcalls.def and all associated usages Summary: This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare. This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code. Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn Reviewed By: efriedma Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72536
* CodeGen: Use LLT instead of EVT in getRegisterByNameMatt Arsenault2020-01-091-1/+1
| | | | | | Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.
* [ARM] Improve codegen of volatile load/store of i64Victor Campos2020-01-071-2/+55
| | | | | | | | | | | | | | | | | | Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072
* [NFC] Fix trivial typos in commentsJames Henderson2020-01-061-1/+1
| | | | | | | | Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.
* [ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectorsDavid Green2020-01-051-4/+7
| | | | | | | | | | | This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing the target independent code to handle more folds in more situations (for example if the fast math flags are present, but the global AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately if needed. Differential Revision: https://reviews.llvm.org/D72139
* Move tail call disabling code to target independent codeReid Kleckner2020-01-031-5/+2
| | | | | | | | | | | | | | | | | When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
* [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG ↵QingShan Zhang2020-01-031-0/+7
| | | | | | | | | | | for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000
* [ARM] Sink splat to ICmpDavid Green2019-12-301-0/+1
| | | | | | | | | This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997
* [ARM] [Windows] Use COFF stubs for calls to extern_weak functionsMartin Storsjö2019-12-231-4/+6
| | | | | | | | | | | | | As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Instead check the shouldAssumeDSOLocal method and load the address from a COFF stub. This matches what was done for X86 in 6bf108d77a3c. Differential Revision: https://reviews.llvm.org/D71720
* Revert "[ARM] Improve codegen of volatile load/store of i64"Victor Campos2019-12-201-55/+2
| | | | This reverts commit bbcf1c3496ce2bd1ed87e8fb15ad896e279633ce.
* [ARM] Improve codegen of volatile load/store of i64Victor Campos2019-12-191-2/+55
| | | | | | | | | | | | | | | | | | Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072
* [ARM] Add custom strict fp conversion lowering when non-strict is customJohn Brawn2019-12-131-33/+64
| | | | | | | | | | | | We have custom lowering for operations converting to/from floating-point types when we don't have hardware support for those types, and this doesn't interact well with the target-independent legalization of the strict versions of these operations. Fix this by adding similar custom lowering of the strict versions. This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics test, with the remaining failures due to poor instruction selection. Differential Revision: https://reviews.llvm.org/D71127
* [NFC] Use EVT instead of bool for getSetCCInverse()Alex Richardson2019-12-131-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void *)0x12033091e < (void *)0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917
* [ARM][MVE] Sink vector shift operandSam Parker2019-12-121-3/+28
| | | | | | | | | | | | | | | | Recommit e0b966643fc2. sub instructions were being generated for the negated value, and for some reason they were the register only ones. I think the problem was because I was grabbing the 'zero' from vmovimm, which is a target constant. Now I'm just generating a new Constant zero and so rsb instructions are now generated. Original commit message: The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841
* Revert "[ARM][MVE] Sink vector shift operand"Sam Parker2019-12-121-27/+3
| | | | | | This reverts commit e0b966643fc2030442ffbae9b677247be697673b. Instruction selection is failing with expensive checks.
* [ARM][MVE] Sink vector shift operandSam Parker2019-12-121-3/+27
| | | | | | | | The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841
* [IR] Split out target specific intrinsic enums into separate headersReid Kleckner2019-12-111-0/+1
| | | | | | | | | | | | | | | | | | | | This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
* [ARM] Attempt to use whole register vmovs for MVE shuffles.David Green2019-12-081-0/+90
| | | | | | | | | | | | | | MVE doesn't have the range of shuffle instructions available in Neon. We also cannot use the trick of cutting a difficult vector shuffle in half to simplify things. Instead we need to be more careful about how we lower shuffles. This patch adds an extra combine that attempts to find "whole lane" vmovs when lowering shuffles of smaller types. This helps us make some shuffles a lot simpler, generating single lane movs for the parts that can make use of it, falling back to the original shuffle for the rest. Differential Revision: https://reviews.llvm.org/D69509
* [ARM] Disable VLD4 under MVEDavid Green2019-12-081-1/+6
| | | | | | | | | | Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109
* [ARM] Add some VCMP folding and canonicalisationDavid Green2019-12-021-8/+43
| | | | | | | | | | The VCMP instructions in MVE can accept a register or ZR, but only as the right hand operator. Most of the time this will already be correct because the icmp will have been canonicalised that way already. There are some cases in the lowering of float conditions that this will not apply to though. This code should fix up those cases. Differential Revision: https://reviews.llvm.org/D70822
* Revert "[ARM] Allocatable Global Register Variables for ARM"Carey Williams2019-11-291-9/+3
| | | | This reverts commit 2d739f98d8a53e38bf9faa88cdb6b0c2a363fb77.
* [ARM] Replace arm_neon_vqadds with sadd_satDavid Green2019-11-271-0/+3
| | | | | | | | | | This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat. This helps generate vqadds from standard IR nodes, which might be produced from the vectoriser. The old variants are removed in the process. Differential Revision: https://reviews.llvm.org/D69350
* [Codegen][ARM] Add addressing modes from masked loads and storesDavid Green2019-11-261-18/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176
* [ARM] MVE interleaving load and stores.David Green2019-11-191-31/+90
| | | | | | | | | | | | | | | | | | | | Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392
* DAG: Add function context to isFMAFasterThanFMulAndFAddMatt Arsenault2019-11-191-1/+2
| | | | | | | | AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.
* [SVE][CodeGen] Scalable vector MVT size queriesGraham Hunter2019-11-181-1/+1
| | | | | | | | | | | | | | | | | | | * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871
* [ARM] Allocatable Global Register Variables for ARMAnna Welker2019-11-181-3/+9
| | | | | | | | | | | | Provides support for using r6-r11 as globally scoped register variables. This requires a -ffixed-rN flag in order to reserve rN against general allocation. If for a given GRV declaration the corresponding flag is not found, or the the register in question is the target's FP, we fail with a diagnostic. Differential Revision: https://reviews.llvm.org/D68862
* [ARM,MVE] Use VMOV.{S8,S16} for sign-extended extractelement.Simon Tatham2019-11-131-3/+4
| | | | | | | | | | | | | | | | | | | | | | MVE includes instructions that extract an 8- or 16-bit lane from a vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td` already included isel patterns to select those instructions in response to the `ARMISD::VGETLANEs` selection-DAG node type. But `ARMISD::VGETLANEs` was never actually generated, because the code that creates it was conditioned on NEON only. It's an easy fix to enable the same code for integer MVE, and now IR that sign-extends the result of an extractelement (whether explicitly or as part of the function call ABI) will use `vmov.s8` instead of `vmov.u8` followed by `sxtb`. Reviewers: SjoerdMeijer, dmgreen, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70132
* [AArch64][X86] Don't assume __powidf2 is available on Windows.Eli Friedman2019-11-081-56/+5
| | | | | | | | | | We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013
* [ARM] Use isFMAFasterThanFMulAndFAdd for MVEDavid Green2019-11-041-0/+30
| | | | | | | | | | | | | | | The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd, where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually available for scalar code. For MVE we don't have the non-fused version though. It makes more sense for isFMAFasterThanFMulAndFAdd to return true, allowing us to simplify some of the existing ISel patterns. The tests here are that non of the existing tests failed, and so we are still selecting VFMA and VFMS. The one test that changed shows we can now select from fast math flags, as opposed to just relying on the isFMADLegalForFAddFSub option. Differential Revision: https://reviews.llvm.org/D69115
* Add Windows Control Flow Guard checks (/guard:cf).Andrew Paverd2019-10-281-0/+3
| | | | | | | | | | | | | | | | | | | Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761
* [ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLEvhscampos2019-10-281-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Writing support for three ACLE functions: unsigned int __cls(uint32_t x) unsigned int __clsl(unsigned long x) unsigned int __clsll(uint64_t x) CLS stands for "Count number of leading sign bits". In AArch64, these two intrinsics can be translated into the 'cls' instruction directly. In AArch32, on the other hand, this functionality is achieved by implementing it in terms of clz (count number of leading zeros). Reviewers: compnerd Reviewed By: compnerd Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69250
* [ARM] Begin adding IR intrinsics for MVE instructions.Simon Tatham2019-10-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit, together with the next few, will add a representative sample of the kind of IR intrinsics that we'll need in order to implement the user-facing ACLE intrinsics for MVE. Supporting all of them will take more work; the intention of this initial series of commits is to implement an intrinsic or two from lots of different categories, as examples and proofs of concept. This initial commit introduces a small number of IR intrinsics for instructions simple enough that they can use Tablegen ISel patterns: the predicated versions of the VADD and VSUB instructions (both integer and FP), VMIN and VMAX, and the float->half VCVT instruction (predicated and unpredicated). When using VPT-predicated instructions in automatic code generation, it will be convenient to specify the predicate value as a vector of the appropriate number of i1. To make it easy to specify all sizes of an instruction in one go and give each one the matching predicate vector type, I've added a system of Tablegen informational records describing MVE's vector types: each one gives the underlying LLVM IR ValueType (which may not be the same if the MVE vector is of explicitly signed or unsigned integers) and an appropriate vNi1 to use as the predicate vector. (Also, those info records include the usual encoding for the types, so that as we add associations between each instruction encoding and one of the new `MVEVectorVTInfo` records, we can remove some of the existing template parameters and replace them with references to the vector type info's fields.) The user-facing ACLE intrinsics will receive a predicate mask as a 16-bit integer, so I've also provided a pair of intrinsics i2v and v2i, to convert between an integer and a vector of i1 by just changing the register class. Reviewers: dmgreen, miyuki, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67158
* [ARM] Add qadd lowering from a sadd_satDavid Green2019-10-211-4/+5
| | | | | | | | | | | | This lowers a sadd_sat to a qadd by treating it as legal. Also adds qsub at the same time. The qadd instruction sets the q flag, but we already have many cases where we do not model this in llvm. Differential Revision: https://reviews.llvm.org/D68976 llvm-svn: 375411
* [Alignment][NFC] TargetCallingConv::setOrigAlign and ↵Guillaume Chatelet2019-10-211-6/+5
| | | | | | | | | | | | | | | | | | | TargetLowering::getABIAlignmentForCallingConv Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69243 llvm-svn: 375407
* [ARM] Lower sadd_sat to qadd8 and qadd16David Green2019-10-211-1/+58
| | | | | | | | | | | | | | Lower the target independent signed saturating intrinsics to qadd8 and qadd16. This custom lowers them from a sadd_sat, catching the node early before it is promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a qadd8/qadd16, so that we can call demand bits on it to show that it does not use the upper bits. Also handles QSUB8 and QSUB16. Differential Revision: https://reviews.llvm.org/D68974 llvm-svn: 375402
* [DAGCombine][ARM] Enable extending masked loadsSam Parker2019-10-171-3/+14
| | | | | | | | | | | Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085
* [ARM] Selection for MVE VMOVNDavid Green2019-10-141-0/+35
| | | | | | | | | | | | | | The adds both VMOVNt and VMOVNb instruction selection from the appropriate shuffles. We detect shuffle masks of the form: 0, N, 2, N+2, 4, N+4, ... or 0, N+1, 2, N+3, 4, N+5, ... ISel will also try the opposite patterns, with inputs reversed. These are selected to VMOVNt and VMOVNb respectively. Differential Revision: https://reviews.llvm.org/D68283 llvm-svn: 374781
* [ARM] VQSUB instructionDavid Green2019-10-101-0/+2
| | | | | | | | Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68567 llvm-svn: 374377
* [ARM] VQADD instructionsDavid Green2019-10-101-0/+2
| | | | | | | | | This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336
* [ISEL][ARM][AARCH64] Tracking simple parameter forwarding registersNikola Prica2019-10-081-1/+8
| | | | | | | | | | | | | | Support for tracking registers that forward function parameters into the following function frame. For now we only support cases when parameter is forwarded through single register. Reviewers: aprantl, vsk, t.p.northover Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D66953 llvm-svn: 374033
* [ARM] Generate vcmp instead of vcmpeKristof Beyls2019-10-081-36/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on the discussion in http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the conclusion was reached that the ARM backend should produce vcmp instead of vcmpe instructions by default, i.e. not be producing an Invalid Operation exception when either arguments in a floating point compare are quiet NaNs. In the future, after constrained floating point intrinsics for floating point compare have been introduced, vcmpe instructions probably should be produced for those intrinsics - depending on the exact semantics they'll be defined to have. This patch logically consists of the following parts: - Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which implemented fine-tuning for when to produce vcmpe (i.e. not do it for equality comparisons). The complexity introduced by those patches isn't needed anymore if we just always produce vcmp instead. Maybe these patches need to be reintroduced again once support is needed to map potential LLVM-IR constrained floating point compare intrinsics to the ARM instruction set. - Simply select vcmp, instead of vcmpe, see simple changes in lib/Target/ARM/ARMInstrVFP.td - Adapt lots of tests that tested for vcmpe (instead of vcmp). For all of these test, the intent of what is tested for isn't related to whether the vcmp should produce an Invalid Operation exception or not. Fixes PR43374. Differential Revision: https://reviews.llvm.org/D68463 llvm-svn: 374025
* [ARM] Make helpers static. NFC.Benjamin Kramer2019-10-021-3/+5
| | | | llvm-svn: 373503
* [ARM] Identity shuffles are legalDavid Green2019-10-021-0/+1
| | | | | | | | | | | | | | | Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE (they essentially just become bitcasts). We were not catching that in the existing set of what we considered legal though. On NEON, they would be covered by vext's, but that is not generally available in MVE. This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here but does what we want and prevents us from just rewriting what is the same function. Differential Revision: https://reviews.llvm.org/D68241 llvm-svn: 373446
OpenPOWER on IntegriCloud