summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
...
* CodeGen: Introduce a class for registersMatt Arsenault2019-06-247-19/+19
| | | | | | | | | Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191
* AArch64: Add support for reading pc using llvm.read_register.Peter Collingbourne2019-06-221-0/+8
| | | | | | | | | | | | | | | This is useful for allowing code to efficiently take an address that can be later mapped onto debug info. Currently the hwasan pass achieves this by taking the address of the current function: http://llvm-cs.pcc.me.uk/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#921 but this costs two instructions (plus a GOT entry in PIC code) per function with stack variables. This will allow the cost to be reduced to a single instruction. Differential Revision: https://reviews.llvm.org/D63471 llvm-svn: 364126
* AArch64: Prefer FP-relative debug locations in HWASANified functions.Peter Collingbourne2019-06-223-11/+18
| | | | | | | | | | | | | | | | | | | | To help produce better diagnostics for stack use-after-return, we'd like to be able to determine the addresses of each HWASANified function's local variables given a small amount of information recorded on entry to the function. Currently we require all HWASANified functions to use frame pointers and record (PC, FP) on function entry. This works better than recording SP because FP cannot change during the function, unlike SP which can change e.g. due to dynamic alloca. However, most variables currently end up using SP-relative locations in their debug info. This prevents us from recomputing the address of most variables because the distance between SP and FP isn't recorded in the debug info. To address this, make the AArch64 backend prefer FP-relative debug locations when producing debug info for HWASANified functions. Differential Revision: https://reviews.llvm.org/D63300 llvm-svn: 364117
* [COFF, ARM64] Fix encoding of debugtrap for WindowsTom Tan2019-06-214-0/+17
| | | | | | | | | | | | On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break point instruction on ARM64 which is recognized by Windows debugger and exception handling code, so llvm.debugtrap should map to it instead of redirecting to llvm.trap (brk #1) as the default implementation. Differential Revision: https://reviews.llvm.org/D63635 llvm-svn: 364115
* [AArch64][GlobalISel] Implement selection support for the new G_JUMP_TABLE ↵Amara Emerson2019-06-212-0/+52
| | | | | | | | | | and G_BRJT ops. With this we can now fully code generate jump tables, which is important for code size. Differential Revision: https://reviews.llvm.org/D63223 llvm-svn: 364086
* [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.Amara Emerson2019-06-212-4/+7
| | | | | | | | | | | | | | | | | | | | | We sometimes get poor code size because constants of types < 32b are legalized as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the localizer can no longer sink them (although it's possible to extend it to do so). On AArch64 however s8 and s16 constants can be selected in the same way as s32 constants, with a mov pseudo into a W register. If we make s8 and s16 constants legal then we can avoid unnecessary truncates, they can be CSE'd, and the localizer can sink them as normal. There is a caveat: if the user of a smaller constant has to widen the sources, we end up with an anyext of the smaller typed G_CONSTANT. This can cause regressions because of the additional extend and missed pattern matching. To remedy this, there's a new artifact combiner to generate the wider G_CONSTANT if it's legal for the target. Differential Revision: https://reviews.llvm.org/D63587 llvm-svn: 364075
* hwasan: Shrink outlined checks by 1 instruction.Peter Collingbourne2019-06-191-12/+7
| | | | | | | | | Turns out that we can save an instruction by folding the right shift into the compare. Differential Revision: https://reviews.llvm.org/D63568 llvm-svn: 363874
* [GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra ↵Amara Emerson2019-06-171-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | block. Inter-block localization is the same as what currently happens, except now it only runs on the entry block because that's where the problematic constants with long live ranges come from. The second phase is a new intra-block localization phase which attempts to re-sink the already localized instructions further right before one of the multiple uses. One additional change is to also localize G_GLOBAL_VALUE as they're constants too. However, on some targets like arm64 it takes multiple instructions to materialize the value, so some additional heuristics with a TTI hook have been introduced attempt to prevent code size regressions when localizing these. Overall, these changes improve CTMark code size on arm64 by 1.2%. Full code size results: Program baseline new diff ------------------------------------------------------------------------------ test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6% test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6% test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4% test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2% test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2% test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9% test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0% test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0% Geomean difference -1.2% Differential Revision: https://reviews.llvm.org/D63303 llvm-svn: 363632
* [GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do soJessica Paquette2019-06-171-16/+144
| | | | | | | | | | | | | | | | | | | | | | | | | Basically porting over the behaviour in AArch64ISelLowering to GISel. See emitComparison for reference. When we have something like this: ``` lhs = G_SUB 0, y ... G_ICMP lhs, rhs ``` We can fold away the G_SUB and produce a cmn instead, given that we produce the same value in NZCV. Add a test showing that the transformation works, and also showing that we don't perform the transformation when it's unsafe. Also factor out the CSet emission into emitCSetForICMP. Differential Revision: https://reviews.llvm.org/D63163 llvm-svn: 363596
* [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests ↵Simon Pilgrim2019-06-122-9/+12
| | | | | | | | | | | | | | (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179
* [AArch64] Merge globals when optimising for sizeSjoerd Meijer2019-06-121-1/+14
| | | | | | | | | | | | | Extern global merging is good for code-size. There's definitely potential for performance too, but there's one regression in a benchmark that needs investigating, so that's why we enable it only when we optimise for size for now. Patch by Ramakota Reddy and Sjoerd Meijer. Differential Revision: https://reviews.llvm.org/D61947 llvm-svn: 363130
* Revert CMake: Make most target symbols hidden by defaultTom Stellard2019-06-116-6/+6
| | | | | | | | | | | | | | | This reverts r362990 (git commit 374571301dc8e9bc9fdd1d70f86015de198673bd) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028
* CMake: Make most target symbols hidden by defaultTom Stellard2019-06-106-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 36221 nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990
* [AArch64][GlobalISel] Select immediate forms of cmp instructions.Amara Emerson2019-06-091-5/+17
| | | | | | | | A simple re-use of the immediate operand matcher and renderer functions. rdar://43795178 llvm-svn: 362896
* [SystemZ, RegAlloc] Favor 3-address instructions during instruction selection.Jonas Paulsson2019-06-082-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868
* [AArch64][AsmParser] error on unexpected SVE predicate type suffixCullen Rhodes2019-06-071-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch fixes a bug in the assembler that permitted a type suffix on predicate registers when not expected. For instance, the following was previously valid: faddv h0, p0.q, z1.h This bug was present in all SVE instructions containing predicates with no type suffix and no predication form qualifier, i.e. /z or /m. The latter instructions are already caught with an appropiate error message by the assembler, e.g.: .text <stdin>:1:13: error: not expecting size suffix cmpne p1.s, p0.b/z, z2.s, 0 ^ A similar issue for SVE vector registers was fixed in: https://reviews.llvm.org/D59636 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62942 llvm-svn: 362780
* [AArch64][AsmParser] Provide better diagnostics for SVE predicatesCullen Rhodes2019-06-071-1/+5
| | | | | | | | | | Patch by Sander de Smalen (sdesmalen) Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62941 llvm-svn: 362779
* AArch64] Handle ISD::LRINT and ISD::LLRINT for float16Adhemerval Zanella2019-06-061-0/+8
| | | | | | | | | | | This patch is a follow up for D62018 to add lrint/llrint support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62863 llvm-svn: 362700
* [AArch64] Handle ISD::LROUND and ISD::LLROUND for float16Adhemerval Zanella2019-06-061-0/+8
| | | | | | | | | | | This patch is a follow up for D61391 to add lround/llround support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62861 llvm-svn: 362698
* [AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64.Amara Emerson2019-06-061-0/+23
| | | | | | | | | | | We already get support for G_ZEXTLOAD to s32 from the importer, but it can't deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing with G_ZEXTLOAD isn't much work. Also add tests to check the imported pattern selections to s32 work. llvm-svn: 362681
* [AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed ↵Amara Emerson2019-06-061-0/+5
| | | | | | | | to go into r362666. The changes weren't staged so ended up just re-commiting the unmodified reverted change. llvm-svn: 362677
* Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when ↵Amara Emerson2019-06-051-8/+96
| | | | | | | | | | | | G_SELECT is fp"" When looking through copies, make sure to not try to find the vreg def of a physreg. Normally getVRegDef will return nullptr in this case, but if there happens to be multiple defs then it will assert. This fixes PR42129. llvm-svn: 362666
* Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT ↵Petr Hosek2019-06-051-96/+8
| | | | | | | | is fp" This reverts commit r362435 as this triggers ICE, see PR42129 for details. llvm-svn: 362662
* [AArch64][GlobalISel] Make extloads to i64 legal.Amara Emerson2019-06-041-0/+3
| | | | | | | | Although we had the support in the prelegalizer combiner to generate the G_SEXTLOAD or G_ZEXTLOAD ops, the legalizer definitions for arm64 had them as lowering back to separate ops. llvm-svn: 362553
* [AArch64][ELF] Add support for PLT decoding with BTI instructions presentPeter Smith2019-06-041-1/+9
| | | | | | | | | | | | | | | | | | | | | | Arm Architecture v8.5a introduces Branch Target Identification (BTI). When enabled all indirect branches must target a bti instruction of the appropriate form. As PLT sequences may sometimes be the target of an indirect branch and PLT[0] always is, a static linker may need to generate PLT sequences that contain "bti c" as the first instruction. In effect: bti c adrp x16, page offset to .got.plt ... Instead of: adrp x16, page offset to .got.plt ... At present the PLT decoding assumes the adrp will always be the first instruction. This patch adds support for a single "bti c" to prefix it. A test binary has been uploaded with such a PLT sequence. A forthcoming LLD patch will make heavy use of the PLT decoding code. Differential Revision: https://reviews.llvm.org/D62598 llvm-svn: 362523
* [AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fpJessica Paquette2019-06-031-8/+96
| | | | | | | | | | | | | | | | Instead of emitting all of the test stuff for a compare when it's only used by a select, instead, just emit the compare + select. The select will use the value of NZCV correctly, so we don't need to emit all of the test instructions etc. For now, only support fp selects which use G_FCMP. Also only support condition codes which will only require one select to represent. Also add a test. Differential Revision: https://reviews.llvm.org/D62695 llvm-svn: 362446
* [AArch64] Check for simple type in FPToUIntSam Parker2019-06-031-0/+3
| | | | | | | | | | | DAGCombiner was hitting a SimpleType assertion when trying to combine a v3f32 before type legalization. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916 Differential Revision: https://reviews.llvm.org/D62734 llvm-svn: 362365
* [COFF, ARM64] Add CodeView register mappingTom Tan2019-05-311-5/+172
| | | | | | | | | | | | | | | | CodeView has its own register map which is defined in cvconst.h. Missing this mapping before saving register to CodeView causes debugger to show incorrect value for all register based variables, like variables in register and local variables addressed by register (stack pointer + offset). This change added mapping between LLVM register and CodeView register so the correct register number will be stored to CodeView/PDB, it aso fixed the mapping from CodeView register number to register name based on current CPUType but print PDB to yaml still assumes X86 CPU and needs to be fixed. Differential Revision: https://reviews.llvm.org/D62608 llvm-svn: 362280
* [AArch64][SVE2] Asm: support WHILE instructionsCullen Rhodes2019-05-312-0/+41
| | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: * WHILEGE, WHILEGT, WHILEHS, WHILEHI, WHILEWR, WHILERW The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62601 llvm-svn: 362215
* [AArch64][SVE2] Asm: support TBL/TBX instructionsCullen Rhodes2019-05-312-7/+44
| | | | | | | | | | | | | | | | | Summary: A three sources variant of the TBL instruction is added to the existing SVE instruction in SVE2. This is implemented with minor changes to the existing TableGen class. TBX is a new instruction with its own definition. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62600 llvm-svn: 362214
* [AArch64][SVE2] Asm: support SVE2 store instructionsCullen Rhodes2019-05-312-0/+47
| | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: * STNT1B, STNT1H, STNT1S, STNT1D The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62599 llvm-svn: 362213
* [AArch64][SVE2] Asm: support SVE2 vector splice (constructive)Cullen Rhodes2019-05-302-0/+27
| | | | | | | | | | | | Summary: The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62530 llvm-svn: 362073
* [AArch64][SVE2] Asm: support SVE2 load instructionsCullen Rhodes2019-05-302-0/+55
| | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: * LDNT1SB, LDNT1B, LDNT1SH, LDNT1H, LDNT1SW, LDNT1W, LDNT1D The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62528 llvm-svn: 362072
* [AArch64][SVE2] Asm: support FCVTX/FLOGB instructionsCullen Rhodes2019-05-302-0/+12
| | | | | | | | | | | | | | | | | Summary: Patch completes SVE2 support for: SVE Floating Point Unary Operations - Predicated Group The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62526 llvm-svn: 362071
* [AArch64][SVE2] Asm: add ext (immediate offset, constructive) instructionCullen Rhodes2019-05-302-0/+18
| | | | | | | | | | | | Summary: The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62518 llvm-svn: 362070
* [AArch64][SVE2] Asm: support SVE Bitwise Logical - Unpredicated GroupCullen Rhodes2019-05-292-0/+81
| | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: * EOR3, BSL, BCAX, BSL1N, BSL2N, NBSL, XAR Aliases for types .B/.H/.S for EOR3 and BCAX have been added, the preferred disassembly is .D. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62387 llvm-svn: 361936
* [AArch64][SVE2] Asm: support Floating Point Widening Multiply-AddCullen Rhodes2019-05-292-0/+68
| | | | | | | | | | | | | | | Summary: Patch adds support for the indexed and unpredicated vectors forms of the FMLALB, FMLALT, FMLSLB and FMLSLT instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62386 llvm-svn: 361935
* [AArch64][SVE2] Asm: support SVE2 Floating Point Pairwise GroupCullen Rhodes2019-05-292-0/+40
| | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 floating-point pairwise operations: * FADDP, FMAXNMP, FMINNMP, FMAXP, FMINP The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62383 llvm-svn: 361933
* [AArch64][GlobalISel] Select FCMPSri/FCMPDri when comparing against 0.0Jessica Paquette2019-05-281-13/+27
| | | | | | | | | | | Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and factor out opcode selection for G_FCMP into its own function. Add a test to show that we don't do this with other immediates. Differential Revision: https://reviews.llvm.org/D62539 llvm-svn: 361888
* [AArch64] Handle ISD::LRINT and ISD::LLRINTAdhemerval Zanella2019-05-282-0/+15
| | | | | | | | | | | This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus fcvtzs. It currently only handles the scalar version. Reviewed By: SjoerdMeijer, mstorsjo Differential Revision: https://reviews.llvm.org/D62018 llvm-svn: 361877
* [AArch64] Delete unused VariantKind in AArch64MCExprFangrui Song2019-05-282-4/+1
| | | | llvm-svn: 361844
* [NFC] Test commit, delete trailing whitespaceGraham Hunter2019-05-281-1/+1
| | | | llvm-svn: 361813
* [AArch64][SVE2] Asm: support SVE2 Floating Point Convert GroupCullen Rhodes2019-05-282-0/+42
| | | | | | | | | | | | | | | | | Summary: Patch adds support for the following intructions: SVE2 floating-point convert precision: * FCVTXNT, FCVTNT, FCVTLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62382 llvm-svn: 361801
* [AArch64][SVE2] Asm: support SVE2 Crypto Extensions GroupCullen Rhodes2019-05-282-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 crypto constructive binary operations: * SM4EKEY, RAX1 SVE2 crypto destructive binary operations: * AESE, AESD, SM4E SVE2 crypto unary operations: * AESMC, AESIMC AESE, AESD, AESMC and AESIMC are enabled with +sve2-aes. SM4E and SM4EKEY are enabled with +sve2-sm4. RAX1 is enabled with +sve2-sha3. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62307 llvm-svn: 361797
* [AArch64][SVE2] Asm: support SVE2 Histogram Computation GroupsCullen Rhodes2019-05-282-0/+53
| | | | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 histogram generation (segment): * HISTSEG SVE2 histogram generation (vector): * HISTCNT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62306 llvm-svn: 361796
* [AArch64][SVE2] Asm: support SVE2 Misc GroupCullen Rhodes2019-05-282-0/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 bitwise exclusive-or interleaved: * EORBT, EORTB SVE2 bitwise permute: * BEXT, BDEP, BGRP SVE2 bitwise shift left long: * SSHLLB, SSHLLT, USHLLB, USHLLT SVE2 integer add/subtract interleaved long: * SADDLBT, SSUBLBT, SSUBLTB BDEP, BEXT and BGRP are enabled with SVE2 feature +bitperm, all other instructions in this group are enabled with +sve2. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62304 llvm-svn: 361795
* Include what you use in AArch64AsmBackend.cppDmitri Gribenko2019-05-271-1/+4
| | | | | | | | | | AArch64AsmBackend.cpp was not using any APIs from AArch64.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary AArch64 target library and the MCTargetDesc library). llvm-svn: 361774
* [GlobalISel][AArch64] Make FP constraint checks consider possible use/def banksJessica Paquette2019-05-242-7/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In a few places in getInstrMapping, we check if use/def instructions for the instruction we're mapping have floating point constraints. We can improve this check and reduce the number of copies in GISel-compiled code if we make a couple observations: - For a def instruction, it only matters if the def instruction must always output a value stored on a FPR - For a use instruction, it only matters if the use instruction must always only take in values stored in FPRs This adds two new functions: - onlyUsesFP - onlyDefinesFP Then we can use those when we're checking the uses/defs instead. Without this patch, the load, unmerge, store, and select in the added test would have unnecessary copies. Differential Revision: https://reviews.llvm.org/D62426 llvm-svn: 361679
* [GlobalISel][AArch64] NFC: Factor out HasFPConstraints into a proper functionJessica Paquette2019-05-242-41/+32
| | | | | | | | | Factor it out into a function, and replace places where we had the same check with the new function. Differential Revision: https://reviews.llvm.org/D62421 llvm-svn: 361677
* [GlobalISel][AArch64] Improve register bank mappings for G_SELECTJessica Paquette2019-05-241-6/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | The fcsel and csel instructions differ in only the register banks they work on. So, they're entirely interchangeable otherwise. With this in mind, this does two things: - Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as the outputs. - Teach it to choose the best register bank mapping based off the constraints of the inputs and outputs. The "best" in this case means the one that requires the smallest number of copies to properly emit a fcsel/csel. For example, if the inputs are all already going to be on FPRs, we should emit a fcsel, even if the output is a GPR. This costs one copy to produce the result, but saves us from copying the inputs into GPRs. Also update the regbank-select.mir to check that we end up with the right select instruction. Differential Revision: https://reviews.llvm.org/D62267 llvm-svn: 361665
OpenPOWER on IntegriCloud