summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64][WinCFI] Do not pair callee-save instructions in LoadStoreOptimizerSander de Smalen2019-08-071-0/+12
| | | | | | | | | | | | | | | Prevent the LoadStoreOptimizer from pairing any load/store instructions with instructions from the prologue/epilogue if the CFI information has encoded the operations as separate instructions. This would otherwise lead to a mismatch of the actual prologue size from the size as recorded in the Windows CFI. Reviewers: efriedma, mstorsjo, ssijaric Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D65817 llvm-svn: 368164
* [GISel]: Add GISelKnownBits analysisAditya Nandakumar2019-08-061-4/+13
| | | | | | | | | | | | | | https://reviews.llvm.org/D65698 This adds a KnownBits analysis pass for GISel. This was done as a pass (compared to static functions) so that we can add other features such as caching queries(within a pass and across passes) in the future. This patch only adds the basic pass boiler plate, and implements a lazy non caching knownbits implementation (ported from SelectionDAG). I've also hooked up the AArch64PreLegalizerCombiner pass to use this - there should be no compile time regression as the analysis is lazy. llvm-svn: 368065
* [globalisel] Allow SrcOp to convert an APInt and render it as an immediate ↵Daniel Sanders2019-08-062-2/+2
| | | | | | | | | | | | | | | | | | | | operand (MO.isImm() == true) Summary: This is tested by D61289 but has been pulled into a separate patch at a reviewers request. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm, rovka Reviewed By: arsenm Subscribers: javed.absar, hiraditya, wdng, kristof.beyls, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61321 llvm-svn: 368063
* [AArch64] NFC: Generalize emitFrameOffset to support more than byte offsets.Sander de Smalen2019-08-061-65/+83
| | | | | | | | | | | | | | Refactor emitFrameOffset to accept a StackOffset struct as its offset argument. This method currently only supports byte offsets (MVT::i8) but will be extended in a later patch to support scalable offsets (MVT::nxv1i8) as well. Reviewers: thegameg, rovka, t.p.northover, efriedma, greened Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61436 llvm-svn: 368049
* AArch64: bail instead of asserting on unexpected type in G_CONSTANT 0.Tim Northover2019-08-061-2/+2
| | | | llvm-svn: 368031
* [AArch64] NFC: Add generic StackOffset to describe scalable offsets.Sander de Smalen2019-08-067-74/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support spilling/filling of scalable vectors we need a more generic representation of a stack offset than simply 'int'. For this we introduce the StackOffset struct, which comprises multiple offsets sized by their respective MVTs. Byte-offsets will thus be a simple tuple such as { offset, MVT::i8 }. Adding two byte-offsets will result in a byte offset { offsetA + offsetB, MVT::i8 }. When two offsets have different types, we can canonicalise them to use the same MVT, as long as their runtime sizes are guaranteed to have the same size-ratio as they would have at compile-time. When we have both scalable- and fixed-size objects on the stack, we can create an offset that is: ({ offset_fixed, MVT::i8 } + { offset_scalable, MVT::nxv1i8 }) The struct also contains a getForFrameOffset() method that is specific to AArch64 and decomposes the frame-offset to be used directly in instructions that operate on the stack or index into the stack. Note: This patch adds StackOffset as an AArch64-only concept, but we would like to make this a generic concept/struct that is supported by all interfaces that take or return stack offsets (currently as 'int'). Since that would be a bigger change that is currently pending on D32530 landing, we thought it makes sense to first show/prove the concept in the AArch64 target before proposing to roll this out further. Reviewers: thegameg, rovka, t.p.northover, efriedma, greened Reviewed By: rovka, greened Differential Revision: https://reviews.llvm.org/D61435 llvm-svn: 368024
* AArch64: use xzr/wzr for constant 0 in GlobalISel.Tim Northover2019-08-061-0/+25
| | | | | | | COPYs from xzr and wzr can often be folded away entirely during register allocation, unlike a movz. llvm-svn: 368003
* [GlobalISel][CallLowering] Rename isArgumentHandler() -> ↵Amara Emerson2019-08-051-1/+1
| | | | | | | | | isIncomingArgumentHandler() Previous name and comment incorrectly implied it was just for formal arg handlers, which is not true. llvm-svn: 367945
* [AArch64][GlobalISel] Inline tiny memcpy et al at -O0.Amara Emerson2019-08-051-2/+5
| | | | | | | | | | | FastISel already does this since the initial arm64 port was upstreamed, so it seems there are no issues with doing this at -O0 for very small memcpys. Gives a 0.2% geomean code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D65758 llvm-svn: 367919
* [AArch64] Expand bcmp() for small block lengthsEvandro Menezes2019-08-053-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should the target support the it. Unless `TTI::MemCmpExpansionOptions()` is overridden by the target. In a proprietary benchmark we see a performance drop of about 12% on PNG compression before this patch, though it passes all tests. This patch mirrors X86 for AArch64 and initializes `TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when appropriate. No tuning of the parameters was performed, but, at this point, it's enough to recover the performance drop above. This problem also exists on ARM. Once a consensus is reached for AArch64, we can work to fix ARM as well. Authors: - Evandro Menezes (@evandro) <e.menezes@samsung.com> - Brian Rzycki (@brzycki) <b.rzycki@samsung.com> Differential revision: https://reviews.llvm.org/D64805 llvm-svn: 367898
* [AArch64] Set preferred function alignment to 16 bytes on Neoverse N1Pablo Barrio2019-08-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The Arm Neoverse N1 Software Optimization Guide [1], Section "4.8 Branch instruction alignment" states: "Consider aligning subroutine entry points and branch targets to 32B boundaries, within the bounds of the code-density requirements of the program." This patch sets the preferred function alignment on Neoverse N1 to 2^4=16B. This was already the case in some of the latest Cortex-A CPUs. Benchmarking in previous Cortex-A CPUs suggested that 16B alignment is already better than the default. See commit d04ee305. The reason we don't set it to 32B right now (as the optimisation guide suggests) is that this will impact code size and perhaps the instruction cache performance. Therefore we need benchmark numbers first. I have also added testing for A75 and A76 that we were missing. [1] https://developer.arm.com/docs/swog309707/latest Reviewers: fhahn, greened, samparker, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65654 llvm-svn: 367894
* [AArch64] Implement initial SVE calling convention supportCullen Rhodes2019-08-053-1/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds initial support for the SVE calling convention such that SVE types can be passed as arguments and return values to/from a subroutine. The SVE AAPCS states [1]: z0-z7 are used to pass scalable vector arguments to a subroutine, and to return scalable vector results from a function. If a subroutine takes arguments in scalable vector or predicate registers, or if it is a function that returns results in such registers, it must ensure that the entire contents of z8-z23 are preserved across the call. In other cases it need only preserve the low 64 bits of z8-z15, as described in §5.1.2. p0-p3 are used to pass scalable predicate arguments to a subroutine and to return scalable predicate results from a function. If a subroutine takes arguments in scalable vector or predicate registers, or if it is a function that returns results in these registers, it must ensure that p4-p15 are preserved across the call. In other cases it need not preserve any scalable predicate register contents. SVE predicate and data registers are passed indirectly (i.e. spilled to the stack and pass the address) if they exceed the registers used for argument passing defined by the PCS referenced above. Until SVE stack support is merged we can't spill SVE registers to the stack, so currently an llvm_unreachable is used where we will eventually handle this. [1] https://static.docs.arm.com/100986/0000/100986_0000.pdf Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D65448 llvm-svn: 367859
* [AArch64] Skip isZIPMask check for masks with an odd number of elements.Florian Hahn2019-08-051-0/+2
| | | | | | | | | | | | | We process 2 elements at a time and expect the number of elements to be even. Similar to D60690. Reviewers: dmgreen, samparker, t.p.northover Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D65400 llvm-svn: 367831
* [LLVM][Alignment] Introduce Alignment TypeGuillaume Chatelet2019-08-051-6/+6
| | | | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828
* Emit diagnostic if an inline asm constraint requires an immediateBill Wendling2019-08-031-2/+10
| | | | | | | | | | | | | | | | | | Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750
* Re-commit "[GlobalISel] Add legalization support for non-power-2 loads and ↵Amara Emerson2019-08-021-8/+5
| | | | | | | | | | stores"" This is an old commit that exposed a bug in the GISel importer, which caused non-truncating stores to be selected for truncating store patterns. Now that's been fixed in r367737 this can go back in. llvm-svn: 367739
* [AArch64][GlobalISel] Eliminate redundant G_ZEXT when the source is ↵Amara Emerson2019-08-021-0/+17
| | | | | | | | | | | | implicitly zext-loaded. These cases can come up when the extending loads combiner doesn't combine a zext(load) to a zextload op, due to some other operation being in between, which then gets simplified at a later stage. Differential Revision: https://reviews.llvm.org/D65360 llvm-svn: 367723
* [AArch64][GlobalISel] Support the neg_addsub_shifted_imm32 patternJessica Paquette2019-08-022-12/+65
| | | | | | | | | | | Add an equivalent ComplexRendererFns function for SelectNegArithImmed. This allows us to select immediate adds of -1 by turning them into subtracts. Update select-binop.mir to show that the pattern works. Differential Revision: https://reviews.llvm.org/D65460 llvm-svn: 367700
* GlobalISel: support swiftself attributeTim Northover2019-08-021-0/+1
| | | | llvm-svn: 367683
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-019-53/+48
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* [AArch64] Do not allocate unnecessary emergency slot.Sander de Smalen2019-08-011-2/+2
| | | | | | | | | | | | | | Fix an issue where the compiler still allocates an emergency spill slot even though it already decided to spill an extra callee-save register to use as a scratch register. Reviewers: gberry, thegameg, mstorsjo, t.p.northover Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D65504 llvm-svn: 367540
* [GISel] Address review feedback on passing MD_callees to lowerCall.Mark Lacey2019-07-311-2/+2
| | | | | | | Preserve the nullptr default for KnownCallees that appears in the base class. llvm-svn: 367477
* [GISel] Pass MD_callees metadata down in call lowering.Mark Lacey2019-07-312-5/+8
| | | | | | | | | | | | | | | | | | | | Summary: This will make it possible to improve IPRA by taking into account register usage in indirect calls. NFC yet; this is just laying the groundwork to start building up patches to take advantage of the information for improved register allocation. Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65488 llvm-svn: 367476
* AArch64: Add a tagged-globals backend feature.Peter Collingbourne2019-07-317-1/+42
| | | | | | | | | | | | | | | | | | | This feature instructs the backend to allow locally defined global variable addresses to contain a pointer tag in bits 56-63 that will be ignored by the hardware (i.e. TBI), but may be used by an instrumentation pass such as HWASAN. It works by adding a MOVK instruction to the regular ADRP/ADD sequence that sets bits 48-63 to the corresponding bits of the global, with the linker bounds check disabled on the ADRP instruction to prevent the tag from causing a link failure. This implementation of the feature omits the MOVK when loading from or storing to a global, which is sufficient for TBI. If the same approach is extended to MTE, assuming that 0 is not configured as a catch-all tag, we will most likely also need the MOVK in this case in order to avoid a tag mismatch. Differential Revision: https://reviews.llvm.org/D65364 llvm-svn: 367475
* SelectionDAG, MI, AArch64: Widen target flags fields/arguments from unsigned ↵Peter Collingbourne2019-07-316-13/+12
| | | | | | | | | | | | | char to unsigned. This makes the field wider than MachineOperand::SubReg_TargetFlags so that we don't end up silently truncating any higher bits. We should still catch any bits truncated from the MachineOperand field as a consequence of the assertion in MachineOperand::setTargetFlags(). Differential Revision: https://reviews.llvm.org/D65465 llvm-svn: 367474
* [AArch64] Add support for Transactional Memory Extension (TME)Momchil Velikov2019-07-314-12/+78
| | | | | | | | | | | | | | | | | | | | | | | | | Re-commit r366322 after some fixes TME is a future architecture technology, documented in https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools https://developer.arm.com/docs/ddi0601/a More about the future architectures: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and TCANCEL and the target feature/arch extension "tme". It also implements TME builtin functions, defined in ACLE Q2 2019 (https://developer.arm.com/docs/101028/latest) Differential Revision: https://reviews.llvm.org/D64416 Patch by Javed Absar and Momchil Velikov llvm-svn: 367428
* [AArch64][SVE2] Load/store instruction fixesCullen Rhodes2019-07-312-37/+30
| | | | | | | | | | | | | | | | Summary: * Loads and stores in SVE2 are gather/scatter not contiguous, fixed by renaming multiclasses to reflect this and also updated comments. * Remove aliases from load/store multiclasses that reflect the behaviour of the original form. * Fix bug in scatter store implementation, vector list should be used as input, not output. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D65392 llvm-svn: 367398
* [AArch64][SVE2] Minor refactoring and cleanupCullen Rhodes2019-07-312-26/+26
| | | | | | | | | | | | | | | | | Summary: * Clarify comment with SVE2 for predicated shifts and move next to other shift instructions. * Clarify comments for various instructions. * Move FCVTX instruction next to other fp conversions. * Move FLOGB to next to other fp instructions and fix description. * Remove "cons" from non-constructive multiclass for bitwise shift-right and accumulate instructions. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D65390 llvm-svn: 367396
* [AArch64][SVE2] Use destination register as source registerCullen Rhodes2019-07-312-86/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch fixes a bug in the following instructions that should have been implemented as destructive. A destructive instruction is an instruction where one of the source registers also acts as the destination register. Therefore, the contents of the source register, when the instruction begins execution, are replaced by the result of the instruction when the instruction completes execution [1]: * SRI/SLI * EORBT/EORTB * TBX * Narrowing top instructions * FP convert precision instructions These changes are non-functional from the assembler/diassembler point-of-view but are necessary for correct codegen. [1] https://static.docs.arm.com/ddi0584/ae/DDI0584A_e_SVE_supp_armv8A.pdf Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D65389 llvm-svn: 367394
* [AArch64][GlobalISel] Implement narrowing of G_SEXT.Amara Emerson2019-07-261-20/+26
| | | | | | | | We need this to narrow a sext to s128. Differential Revision: https://reviews.llvm.org/D65357 llvm-svn: 367164
* [AArch64][GlobalISel] Select @llvm.aarch64.stlxr for 32-bit pointersJessica Paquette2019-07-261-3/+21
| | | | | | | | | | | | | | | | | | | Add partial instruction selection for intrinsics like this: ``` declare i32 @llvm.aarch64.stlxr(i64, i32*) ``` (This only handles the case where a G_ZEXT is feeding the intrinsic.) Also make sure that the added store instruction actually has the memory op from the original G_STORE. Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D65355 llvm-svn: 367163
* [AArch64][SVE2] Rename bitperm feature to sve2-bitpermCullen Rhodes2019-07-263-3/+3
| | | | | | | | | | | | | | | | Summary: The bitperm feature flag is now prefixed with SVE2, as it is for all other SVE2 extensions Patch by Maciej Gabka. Reviewers: sdesmalen, rovka, chill, SjoerdMeijer, rengolin Reviewed By: SjoerdMeijer, rengolin Differential Revision: https://reviews.llvm.org/D65327 llvm-svn: 367124
* [AArch64] Define ETE and TRBE system registersMomchil Velikov2019-07-264-1/+45
| | | | | | | | | | | | | | | | | | | | Embedded Trace Extension and Trace Buffer Extension are optional future architecture extensions. (cf. https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools) Their system registers are documented here: https://developer.arm.com/docs/ddi0601/a ETE shares register names with ETM. One exception is the ETE TRCEXTINSELR0 register, which has the same encoding as the ETM TRCEXTINSELR register (but different semantics). This patch treats them as aliases: the assembler will accept both names, emitting identical encoding, and the disassembler will keep disassembling to TRCEXRINSELR. Differential Revision: https://reviews.llvm.org/D63707 llvm-svn: 367093
* [AArch64][GlobalISel] Simplify zext/sext selection, use MachineIRBuilder. NFC.Amara Emerson2019-07-261-32/+28
| | | | llvm-svn: 367075
* [AArch64][GlobalISel] Fix G_SELECT legalization fallback after r366943.Amara Emerson2019-07-251-1/+1
| | | | | | Changes the order of legalization of G_ICMP suggested by Petar in D65079. llvm-svn: 367060
* [AArch64][SVE] Allow explicit size specifier for predicate operandMomchil Velikov2019-07-251-8/+15
| | | | | | | | | | | | | ... for the vector forms of `{SQ,UQ,}{INC,DEC}P` instructions. Also continue supporting the exsting behaviour of not requiring an explicit size specifier. The preferred disasembly is *with* the specifier. This is implemented by redefining intruction forms to require vector predicates with explicit size and adding aliases, which allow a predicate with no size. Differential Revision: https://reviews.llvm.org/D65145 llvm-svn: 367019
* [ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1Pablo Barrio2019-07-253-2/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1. Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the Arm architecture. Neoverse N1 implements both AArch32 and AArch64. Cortex-A65: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65 Cortex-A65AE: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65ae Neoverse E1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-e1 Neoverse N1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1 Patch by Diogo Sampaio and Pablo Barrio Reviewers: samparker, LukeCheeseman, sbaranga, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64406 llvm-svn: 367007
* [AArch64][GlobalISel] Select immediate modes for ADD when selecting G_GEPJessica Paquette2019-07-241-2/+35
| | | | | | | | | | | | | | | Before, we weren't able to select things like this for G_GEP: add x0, x8, #8 And instead we'd materialize the 8. This teaches GISel to do that. It gives some considerable code size savings on 252.eon-- about 4%! Differential Revision: https://reviews.llvm.org/D65248 llvm-svn: 366959
* [AArch64][GlobalISel] Don't try to use GISel if subtarget doesn't have neon ↵Amara Emerson2019-07-241-0/+6
| | | | | | | | | | | | | | or fp. Throughout the legalizerinfo we currently make the assumption that the target has neon and FP target features available. Fixing it will require a refactor of the whole thing, so until then make sure we fall back. Works around PR42734 Differential Revision: https://reviews.llvm.org/D65244 llvm-svn: 366957
* [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 foldRoman Lebedev2019-07-242-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955
* [AArch64][GlobalISel] Fold G_MUL into XRO load addressing mode when possibleJessica Paquette2019-07-241-9/+40
| | | | | | | | | | | | If we have a G_MUL, and either the LHS or the RHS of that mul is the legal shift value for a load addressing mode, we can fold it into the load. This gives some code size savings on some SPEC tests. The best are around 2% on 300.twolf and 3% on 254.gap. Differential Revision: https://reviews.llvm.org/D65173 llvm-svn: 366954
* [GlobalISel] Support for inlining memcpy, memset and memmove calls.Amara Emerson2019-07-243-3/+86
| | | | | | | | | | | | | This introduces a new family of combiner helper routines that re-use the target specific cost model from SelectionDAG, and generate inline implementations of the memcpy family of intrinsics. The combines are only enabled at optimization levels higher than -O0, and give very substantial performance improvements. Differential Revision: https://reviews.llvm.org/D65167 llvm-svn: 366951
* [AArch64][GlobalISel] Make vector dup optimization look at last elt of ZeroVecJessica Paquette2019-07-241-1/+1
| | | | | | | | | Fix an off-by-one error which made us not look at the last element of the zero vector. This caused a miscompile in 188.ammp. Differential Revision: https://reviews.llvm.org/D65168 llvm-svn: 366930
* [AArch64] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after ↵Fangrui Song2019-07-241-0/+1
| | | | | | r366857 llvm-svn: 366866
* [AArch64][GlobalISel] Add support for s128 loads, stores, extracts, truncs.Amara Emerson2019-07-233-13/+92
| | | | | | | | | | | | We need to be able to load and store s128 for memcpy inlining, where we want to generate Q register mem ops. Making these legal also requires that we add some support in other instructions. Regbankselect should also know about these since they have no GPR register class that can hold them, so need special handling to live on the FPR bank. Differential Revision: https://reviews.llvm.org/D65166 llvm-svn: 366857
* [GlobalISel][AArch64] Save a copy on G_SELECT by fixing condition to GPRJessica Paquette2019-07-231-5/+3
| | | | | | | | The condition can never be fed by FPRs, so it should always be on a GPR. Differential Revision: https://reviews.llvm.org/D65157 llvm-svn: 366854
* [GlobalISel][AArch64] Teach GISel to handle shifts in load addressing modesJessica Paquette2019-07-231-7/+124
| | | | | | | | | | | | | | | | | | | | | | | | | When we select the XRO variants of loads, we can pull in very specific shifts (of the size of an element). E.g. ``` ldr x1, [x2, x3, lsl #3] ``` This teaches GISel to handle these when they're coming from shifts specifically. This adds a new addressing mode function, `selectAddrModeShiftedExtendXReg` which recognizes this pattern. This also packs this up with `selectAddrModeRegisterOffset` into `selectAddrModeXRO`. This is intended to be equivalent to `selectAddrModeXRO` in AArch64ISelDAGtoDAG. Also update load-addressing-modes to show that all of the cases here work. Differential Revision: https://reviews.llvm.org/D65119 llvm-svn: 366819
* [GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREsJessica Paquette2019-07-201-0/+49
| | | | | | | | | | | | | | | | | | | Sometimes, you can end up with cross-bank copies between same-sized GPRs and FPRs, which feed into G_STOREs. When these copies feed only into stores, they aren't necessary; we can just store using the original register bank. This provides some minor code size savings for some floating point SPEC benchmarks. (Around 0.2% for 453.povray and 450.soplex) This issue doesn't seem to show up due to regbankselect or anything similar. So, this patch introduces an early select function, `contractCrossBankCopyIntoStore` which performs the contraction when possible. The selector then continues normally and selects the correct store opcode, eliminating needless copies along the way. Differential Revision: https://reviews.llvm.org/D65024 llvm-svn: 366625
* [GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs ↵Amara Emerson2019-07-192-0/+23
| | | | | | | | | | | | | | and legalize later. I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't do that unless we delay lowering to actual function calls. This patch changes the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and then have each target specify that using the new custom legalizer for intrinsics hook that they want it expanded it a libcall. Differential Revision: https://reviews.llvm.org/D64895 llvm-svn: 366516
* [GlobalISel][AArch64] Add support for base register + offset register loadsJessica Paquette2019-07-181-0/+93
| | | | | | | | | | | | | | | | | | | | Add support for folding G_GEPs into loads of the form ``` ldr reg, [base, off] ``` when possible. This can save an add before the load. Currently, this is only supported for loads of 64 bits into 64 bit registers. Add a new addressing mode function, `selectAddrModeRegisterOffset` which performs this folding when it is profitable. Also add a test for addressing modes for G_LOAD. Differential Revision: https://reviews.llvm.org/D64944 llvm-svn: 366503
OpenPOWER on IntegriCloud