summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Implement aarch64_vector_pcs codegen support.Sander de Smalen2018-09-123-41/+92
| | | | | | | | | | | | | | This patch adds codegen support for the saving/restoring V8-V23 for functions specified with the aarch64_vector_pcs calling convention attribute, as added in patch D51477. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51479 llvm-svn: 342049
* [AArch64] NFC: Refactoring to prepare for vector PCS.Sander de Smalen2018-09-121-39/+74
| | | | | | | | | | | | | | This patch refactors several parts of AArch64FrameLowering so that it can be easily extended to support saving/restoring of FPR128 (Q) registers. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51478 llvm-svn: 342038
* [AArch64] Add parsing of aarch64_vector_pcs attribute.Sander de Smalen2018-09-122-0/+8
| | | | | | | | | | | | | | | | | | | This patch adds parsing support for the 'aarch64_vector_pcs' calling convention attribute to calls and function declarations. More information describing the vector ABI and procedure call standard can be found here: https://developer.arm.com/products/software-development-tools/\ hpc/arm-compiler-for-hpc/vector-function-abi Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D51477 llvm-svn: 342030
* NFC: use bit_cast more in AArch64AddressingModesJF Bastien2018-09-111-24/+11
| | | | | | | | The was previously committed as r341749 then reverted as r341750 because bit_cast needed to do its own thing to check is_trivially_copyable on GCC 4.x. This is now done and std;:array should now get accepted. llvm-svn: 341897
* [Target] Untangle disassemblersBenjamin Kramer2018-09-104-4/+5
| | | | | | | Disassemblers cannot depend on main target headers. The same is true for MCTargetDesc, but there's a lot more cleanup needed for that. llvm-svn: 341822
* Revert "NFC: use bit_cast more in AArch64AddressingModes"JF Bastien2018-09-081-11/+24
| | | | | | | It seems some bots think std::array is either not trivially-copyable, or isn't the right size. llvm-svn: 341750
* NFC: use bit_cast more in AArch64AddressingModesJF Bastien2018-09-081-24/+11
| | | | llvm-svn: 341749
* ADT: add <bit> header, implement C++20 bit_cast, useJF Bastien2018-09-081-12/+9
| | | | | | | | | | | | | | Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning. This was originally committed as r341728 and reverted in r341730. Reviewers: javed.absar, steven_wu, srhines Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51693 llvm-svn: 341741
* Revert "ADT: add <bit> header, implement C++20 bit_cast, use"JF Bastien2018-09-071-9/+12
| | | | | | Bots sad. Looks like missing std::is_trivially_copyable. llvm-svn: 341730
* ADT: add <bit> header, implement C++20 bit_cast, useJF Bastien2018-09-071-12/+9
| | | | | | | | | | | | Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning. Reviewers: javed.absar Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51693 llvm-svn: 341728
* [AArch64] Support reserving x1-7 registers.Nick Desaulniers2018-09-079-52/+79
| | | | | | | | | | | | | | | Summary: Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7. Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma Reviewed By: nickdesaulniers, efriedma Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48580 llvm-svn: 341706
* ARM64: improve non-zero memset isel by ~2xJF Bastien2018-09-061-17/+20
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated where the generated code was bad. This patch fixes the majority of the issues by requesting that a 2xi64 vector be used for memset of 32 bytes and above. The patch leaves the former request for f128 unchanged, despite f128 materialization being suboptimal: doing otherwise runs into other asserts in isel and makes this patch too broad. This patch hides the issue that was present in bzero_40_stack and bzero_72_stack because the code now generates in a better order which doesn't have the store offset issue. I'm not aware of that issue appearing elsewhere at the moment. <rdar://problem/44157755> Reviewers: t.p.northover, MatzeB, javed.absar Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51706 llvm-svn: 341558
* NFC: improve ARM64 isFPImmLegal debug printJF Bastien2018-09-051-19/+8
| | | | | | | Forking this change from D51706. This just made it easier to understand llc output with -debug. llvm-svn: 341504
* [MinGW] Move code for indicating "potentially not DSO local" into ↵Martin Storsjo2018-09-041-12/+9
| | | | | | | | | | | | | | | shouldAssumeDSOLocal. NFC. On Windows, if shouldAssumeDSOLocal returns false, it's either a dllimport reference, or a reference that we should treat as non-local and create a stub for. Clean up AArch64Subtarget::ClassifyGlobalReference a little while touching the flag handling relating to dllimport. Differential Revision: https://reviews.llvm.org/D51590 llvm-svn: 341402
* [MinGW] [AArch64] Add stubs for potential automatic dllimported variablesMartin Storsjo2018-09-045-8/+35
| | | | | | | | | | | The runtime pseudo relocations can't handle the AArch64 format PC relative addressing in adrp+add/ldr pairs. By using stubs, the potentially dllimported addresses can be touched up by the runtime pseudo relocation framework. Differential Revision: https://reviews.llvm.org/D51452 llvm-svn: 341401
* [AArch64] Simplify code in LowerGlobalAddress. NFCI.Martin Storsjo2018-09-031-7/+4
| | | | | | | | | When initial support for dllimport was added for aarch64 in SVN r316555, ClassifyGlobalReference didn't set the MO_DLLIMPORT flag - that was only completed in SVN r323810. Reuse the return value from ClassifyGlobalReference for this purpose as well. llvm-svn: 341310
* [AArch64] Hook up the missed machine operand flag name for MO_DLLIMPORTMartin Storsjo2018-08-311-1/+2
| | | | llvm-svn: 341178
* [CodeGen] emit inline asm clobber list warnings for reserved (cont)Ties Stuij2018-08-302-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a continuation of https://reviews.llvm.org/D49727 Below the original text, current changes in the comments: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } Compiled for thumb, this gives: $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ Reviewers: efriedma, olista01, javed.absar Reviewed By: efriedma Subscribers: eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51165 llvm-svn: 341062
* [AArch64] Optimise load(adr address) to ldr addressDavid Green2018-08-302-9/+32
| | | | | | | | | Providing that the load is known to be 4 byte aligned, we can optimise a ldr(adr address) to just ldr address. Differential Revision: https://reviews.llvm.org/D51030 llvm-svn: 341058
* [AArch64] Reject inline asm with FP registers when FP is disabled.Eli Friedman2018-08-241-0/+9
| | | | | | | | Otherwise, we would crash trying to deal with an illegal input. Differential Revision: https://reviews.llvm.org/D51202 llvm-svn: 340637
* Find PLT entries for x86, x86_64, and AArch64.Joel Galenson2018-08-241-0/+26
| | | | | | | | | | This adds a new method to ELFObjectFileBase that returns the symbols and addresses of PLT entries. This design was suggested by pcc and eugenis in https://reviews.llvm.org/D49383. Differential Revision: https://reviews.llvm.org/D50203 llvm-svn: 340610
* [AArch64] Add Tiny Code Model for AArch64David Green2018-08-2213-47/+138
| | | | | | | | | | | | | | This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397
* [aarch64][mc] Don't lookup symbols when there is no symbol lookup callbackDaniel Sanders2018-08-211-0/+2
| | | | | | | | | | | | | | Summary: When run under llvm-mc-disassemble-fuzzer, there is no symbol lookup callback so tryAddingSymbolicOperand() must fail gracefully instead of crashing Reviewers: aemerson, javed.absar Reviewed By: aemerson Subscribers: lhames, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51005 llvm-svn: 340287
* [AArch64][SVE] Asm: Add SVE System registersSander de Smalen2018-08-201-0/+15
| | | | | | | | | | | | | | | | | | This patch adds system registers for controlling aspects of SVE: - ZCR_EL1 (r/w) visible at EL1 and EL0. - ZCR_EL2 (r/w) visible at EL2 and Non-secure EL1 and EL0. - ZCR_EL3 (r/w) visible at all exception levels. and a system register identifying SVE: - ID_AA64ZFR0_EL1 (r) SVE Feature identifier. Reviewers: SjoerdMeijer, samparker, pbarrio, fhahn, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D50885 llvm-svn: 340158
* [AArch64] - Generate pointer authentication instructionsLuke Cheeseman2018-08-171-0/+59
| | | | | | | | | | | | | | | | - Generate pointer authentication instructions - The functions instrumented depend on function attribtues: all (all functions instrumentent) non-leaf (only those that spill LR) none - Function epilogues sign the LR before spilling to the stack and authenticate the LR once restored - If the target is v8.3a or greater than can use the combined authenticate and return instruction Differential revision: https://reviews.llvm.org/D49793 llvm-svn: 340018
* [ARM/AArch64] Support FP16 +fp16fml instructionsBernard Ogden2018-08-174-0/+47
| | | | | | | | | | | | | | | | | | Add +fp16fml feature for new FP16 instructions, which are a mandatory part of FP16 from v8.4-A and an optional part of FP16 from v8.2-A. It doesn't seem to be possible to model this in LLVM, but the relationship between the options is handled by the related clang patch. In keeping with what I think is the usual practice, the fp16fml extension is accepted regardless of base architecture version. Builds on/replaces Sjoerd Meijer's patch to add these instructions at https://reviews.llvm.org/D49839. Differential Revision: https://reviews.llvm.org/D50228 llvm-svn: 340013
* [MI] Change the array of `MachineMemOperand` pointers to beChandler Carruth2018-08-162-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940
* [SDAG] Remove the reliance on MI's allocation strategy forChandler Carruth2018-08-142-30/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be *surprised* at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
* [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.Eli Friedman2018-08-142-2/+5
| | | | | | | | | | | | | | | | | | | | | Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734
* [AArch64] Fix assertion failure on widened f16 BUILD_VECTORBryan Chan2018-08-061-0/+9
| | | | | | | | | | | | | | | | Summary: Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the same type. This fixes an assertion failure in VerifySDNode. Reviewers: SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50202 llvm-svn: 339013
* [GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per ↵Alexander Ivchenko2018-08-022-19/+32
| | | | | | | | | | Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685
* [AArch64] Add support for got relocated LDR'sDavid Green2018-08-021-0/+2
| | | | | | | | | | As a part of adding the tiny codemodel, we need to support ldr's with :got: relocations on them. This seems to be mostly already done, just needs the relocation type support. Differential Revision: https://reviews.llvm.org/D50137 llvm-svn: 338673
* [AArch64] DWARF: do not generate AT_location for thread localLei Liu2018-08-011-0/+3
| | | | | | | | | | | | | | | AArch64 ELF ABI does not define a static relocation type for TLS offset within a module, which makes it impossible for compiler to generate a valid DW_AT_location content for thread local variables. Currently LLVM generates an invalid R_AARCH64_ABS64 relocation at the DW_AT_location field for a TLS variable. That causes trouble for linker because thread local variable does not have an absolute address at link time. AArch64 GCC solves the problem by not generating DW_AT_location for thread local variables. We should do the same in LLVM. Differential Revision: https://reviews.llvm.org/D43860 llvm-svn: 338655
* [AArch64] Fix FCCMP with FP16 operandsBryan Chan2018-08-011-1/+3
| | | | | | | | | | | | | | Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection. Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50115 llvm-svn: 338554
* [AArch64] Disallow the MachO specific .loh directive for windowsMartin Storsjo2018-08-011-6/+6
| | | | | | | | Also add a test for it being unsupported for linux. Differential Revision: https://reviews.llvm.org/D49929 llvm-svn: 338493
* [AArch64] Support the .inst directive for MachO and COFF targetsMartin Storsjo2018-07-312-7/+18
| | | | | | | | | | Contrary to ELF, we don't add any markers that distinguish data generated with .long from normal instructions, so the .inst directive only adds compatibility with assembly that uses it. Differential Revision: https://reviews.llvm.org/D49935 llvm-svn: 338355
* [AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR.Amara Emerson2018-07-311-31/+65
| | | | | | | | | | | Also refactors some existing code to materialize addresses for the large code model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR. This implements PR36390. Differential Revision: https://reviews.llvm.org/D49903 llvm-svn: 338337
* [AArch64][GlobalISel] Make G_BLOCK_ADDR legal.Amara Emerson2018-07-311-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D49902 llvm-svn: 338336
* [DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to ↵Craig Topper2018-07-302-2/+2
| | | | | | | | BuildSDIV/BuildUDIV/etc. The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation. llvm-svn: 338329
* [DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to ↵Craig Topper2018-07-302-9/+6
| | | | | | BuildSDIVPow2. llvm-svn: 338303
* Remove trailing spaceFangrui Song2018-07-303-5/+5
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* [MachineOutliner][AArch64] Add support for saving LR to a registerJessica Paquette2018-07-302-84/+163
| | | | | | | | | | | | | | | | | | | | | | This teaches the outliner to save LR to a register rather than the stack when possible. This allows us to avoid bumping the stack in outlined functions in some cases. By doing this, in a later patch, we can teach the outliner to do something like this: f1: ... bl OUTLINED_FUNCTION ... f2: ... move LR's contents to a register bl OUTLINED_FUNCTION move the register's contents back instead of falling back to saving LR in both cases. llvm-svn: 338278
* [AArch64][SVE] Asm: Enable instructions to be prefixed.Sander de Smalen2018-07-302-48/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables instructions that are destructive on their destination- and first source operand, to be prefixed with a MOVPRFX instruction. This patch also adds a variety of tests: - positive tests for all instructions and forms that accept a movprfx for either or both predicated and unpredicated forms. - negative tests for all instructions and forms that do not accept an unpredicated or predicated movprfx. - negative tests for the diagnostics that get emitted when a MOVPRFX instruction is used incorrectly. This is patch [2/2] in a series to add MOVPRFX instructions: - Patch [1/2]: https://reviews.llvm.org/D49592 - Patch [2/2]: https://reviews.llvm.org/D49593 Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D49593 llvm-svn: 338261
* [AArch64][SVE] Asm: Add MOVPRFX instructions.Sander de Smalen2018-07-306-30/+273
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds predicated and unpredicated MOVPRFX instructions, which can be prepended to SVE instructions that are destructive on their first source operand, to make them a constructive operation, e.g. add z1.s, p0/m, z1.s, z2.s <=> z1 = z1 + z2 can be made constructive: movprfx z0, z1 add z0.s, p0/m, z0.s, z2.s <=> z0 = z1 + z2 The predicated MOVPRFX instruction can additionally be used to zero inactive elements, e.g. movprfx z0.s, p0/z, z1.s add z0.s, p0/m, z0.s, z2.s Not all instructions can be prefixed with the MOVPRFX instruction which is why this patch also adds a mechanism to validate prefixed instructions. The exact rules when a MOVPRFX applies is detailed in the SVE supplement of the Architectural Reference Manual. This is patch [1/2] in a series to add MOVPRFX instructions: - Patch [1/2]: https://reviews.llvm.org/D49592 - Patch [2/2]: https://reviews.llvm.org/D49593 Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D49592 llvm-svn: 338258
* [AArch64][SVE] Asm: Support for WHILE(LE|LO|LS|LT) instructions.Sander de Smalen2018-07-292-0/+45
| | | | | | | | | | | | | | | | | | | | The WHILE instructions generate a predicate that is true while the comparison of the first scalar operand (incremented for each predicate element) with the second scalar operand is true and false thereafter. WHILELE While incrementing signed scalar less than or equal to scalar WHILELO While incrementing unsigned scalar lower than scalar WHILELS While incrementing unsigned scalar lower than or same as scalar WHILELT While incrementing signed scalar less than scalar e.g. whilele p0.s, x0, x1 generates predicate p0 (for 32bit elements) by incrementing (signed) x0 and comparing that vector to splat(x1). llvm-svn: 338211
* [AArch64][SVE] Asm: Instructions to perform serialized operations.Sander de Smalen2018-07-292-0/+63
| | | | | | | | | | | | The instructions added in this patch permit active elements within a vector to be processed sequentially without unpacking the vector. PFIRST Set the first active element to true. PNEXT Find next active element in predicate. CTERMEQ Compare and terminate loop when equal. CTERMNE Compare and terminate loop when not equal. llvm-svn: 338210
* [AArch64][SVE] Asm: Support for PFALSE and PTEST instructions.Sander de Smalen2018-07-282-0/+45
| | | | | | | | This patch adds PFALSE (unconditionally sets all elements of the predicate to false) and PTEST (set the status flags for the predicate). llvm-svn: 338198
* [AArch64][SVE] Asm: Data-dependent loop predicate partitioning instructions.Sander de Smalen2018-07-282-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for instructions that partition a predicate based on data-dependent termination conditions in a loop. BRKA Break after the first true condition BRKAS Break after the first true condition, setting condition flags BRKB Break before the first true condition BRKBS Break before the first true condition, setting condition flags BRKPA Break after the first true condition, propagating from the previous partition BRKPAS Break after the first true condition, propagating from the previous partition, setting condition flags BRKPB Break before the first true condition, propagating from the previous partition BRKPBS Break before the first true condition, propagating from the previous partition, setting condition flags BRKN Propagate break to next partition BKRNS Propagate break to next partition, setting condition flags llvm-svn: 338196
* DAG: Add calling convention argument to calling convention funcsMatt Arsenault2018-07-281-1/+1
| | | | | | | | This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194
* Recommit "Enable MachineOutliner by default under -Oz for AArch64"Jessica Paquette2018-07-273-0/+9
| | | | | | | | | | | | | | | | | | Fixed the ASAN failure from before in r338148, so recommiting. This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338160
OpenPOWER on IntegriCloud