summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Adjust the feature set for ExynosEvandro Menezes2018-08-301-0/+4
| | | | | | Enable `FeatureUseAA` for all Exynos processors. llvm-svn: 341101
* Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructionsAlexander Ivchenko2018-08-302-6/+9
| | | | | | | | | | | ..Move all target-dependent checks into new isCopyInstrImpl method. This change allows us to treat MoveReg-type instructions and generic COPY instruction in the same way Differential Revision: https://reviews.llvm.org/D49913 llvm-svn: 341072
* [CodeGen] emit inline asm clobber list warnings for reserved (cont)Ties Stuij2018-08-302-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a continuation of https://reviews.llvm.org/D49727 Below the original text, current changes in the comments: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } Compiled for thumb, this gives: $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ Reviewers: efriedma, olista01, javed.absar Reviewed By: efriedma Subscribers: eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51165 llvm-svn: 341062
* Fix "Q" and "R" inline assembly template modifiers for big-endian ArmFlorian Hahn2018-08-301-2/+14
| | | | | | | | | | | | | | Consider the endianness of the target when printing register names. This is in line with the documentation at http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers Patch by Jackson Woodruff <jackson.woodruff@arm.com> Reviewers: t.p.northover, echristo, javed.absar, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D49778 llvm-svn: 341052
* [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.Eli Friedman2018-08-221-1/+3
| | | | | | | | | | The inline sequence is very long (about 70 bytes on Thumb1), so it's not really a good idea to inline it, especially when optimizing for size. Differential Revision: https://reviews.llvm.org/D47917 llvm-svn: 340458
* [ARM] Avoid injecting constant islands in movw+movt pairs on WindowsMartin Storsjo2018-08-221-0/+16
| | | | | | | | | | | | | | | | On Windows, movw+movt pairs with relocations are handled with a single relocation that covers them both. Therefore we can't inject anything between these instructions, otherwise the relocation (which in LLVM only is treated as the movw instruction's relocation, while the movt instruction's relocation is dropped) will end up bogus. These instructions are bundled up until right before the constant islands pass, making this effectively the only place that can split them apart. Differential Revision: https://reviews.llvm.org/D51032 llvm-svn: 340451
* [ARM] Move machine operand target flags to ARMBaseInstrInfoMartin Storsjo2018-08-224-35/+35
| | | | | | | | This makes sure the flags are available for use for thumb MIR as well. A test that requires this will be added in the next commit. llvm-svn: 340450
* [ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant.Eli Friedman2018-08-221-4/+11
| | | | | | | | | | | This avoids a potential infinite loop setting and unsetting bits in the mask. Reduced from a failure on the polly-aosp bot. Differential Revision: https://reviews.llvm.org/D51066 llvm-svn: 340446
* [ARM] Rotated operand patterns for *xtb16Sam Parker2018-08-222-0/+16
| | | | | | | | | Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16 so that they can perform a ror. Differential Revision: https://reviews.llvm.org/D51034 llvm-svn: 340405
* [AArch64] Add Tiny Code Model for AArch64David Green2018-08-222-0/+4
| | | | | | | | | | | | | | This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397
* [ARM/AArch64] Support FP16 +fp16fml instructionsBernard Ogden2018-08-177-3/+95
| | | | | | | | | | | | | | | | | | Add +fp16fml feature for new FP16 instructions, which are a mandatory part of FP16 from v8.4-A and an optional part of FP16 from v8.2-A. It doesn't seem to be possible to model this in LLVM, but the relationship between the options is handled by the related clang patch. In keeping with what I think is the usual practice, the fp16fml extension is accepted regardless of base architecture version. Builds on/replaces Sjoerd Meijer's patch to add these instructions at https://reviews.llvm.org/D49839. Differential Revision: https://reviews.llvm.org/D50228 llvm-svn: 340013
* [ARM][NFC] ARMCodeGenPrepare: some refactoring and algorithm descriptionSjoerd Meijer2018-08-171-33/+85
| | | | | | Differential Revision: https://reviews.llvm.org/D50846 llvm-svn: 339997
* [MI] Change the array of `MachineMemOperand` pointers to beChandler Carruth2018-08-165-27/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940
* [ARM] Ignore GEPs in ARMCodeGenPrepareSam Parker2018-08-161-0/+5
| | | | | | | | | | | While searching through the use-def tree, ignore GetElementPtrInst instructions because they don't need promoting and neither do their indices. Otherwise, the wide indices prevent the transformation from happening. Differential Revision: https://reviews.llvm.org/D50762 llvm-svn: 339871
* [ARM] Allow zext in ARMCodeGenPrepareSam Parker2018-08-161-3/+8
| | | | | | | | Treat zext instructions as roots, like we do for truncs. Differential Revision: https://reviews.llvm.org/D50759 llvm-svn: 339868
* [ARM] Allow signed icmps in ARMCodeGenPrepareSam Parker2018-08-161-30/+45
| | | | | | | | | | | | | | | | | Originally committed in r339755 which was reverted in r339806 due to an asan issue. The issue was caused by my assumption that operands to a CallInst mapped to the FunctionType Params. CallInsts are now handled by iterating over their ArgOperands instead of Operands. Original Message: Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339858
* Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare"Vitaly Buka2018-08-151-44/+22
| | | | | | | | use-after-poison in check-llvm under asan This reverts commit r339755. llvm-svn: 339806
* [ARM] TypeSize lower bound for ARMCodeGenPrepareSam Parker2018-08-151-1/+1
| | | | | | | | | We only try to promote types with are smaller than 16-bits, but we also need to check that the type is not less than 8-bits. Differential Revision: https://reviews.llvm.org/D50769 llvm-svn: 339770
* [ARM] Allow signed icmps in ARMCodeGenPrepareSam Parker2018-08-151-22/+44
| | | | | | | | | | | Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339755
* [ARM] Allow pointer values in ARMCodeGenPrepareSam Parker2018-08-151-18/+30
| | | | | | | | | | | | Add pointers to the list of allowed types, but don't try to promote them. Also fixed a bug with the promotion of undef values, so a new value is now created instead of mutating in place. We also now only promote if there's an instruction in the use-def chains other than the icmp, sinks and sources. Differential Revision: https://reviews.llvm.org/D50054 llvm-svn: 339754
* [SDAG] Remove the reliance on MI's allocation strategy forChandler Carruth2018-08-142-35/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be *surprised* at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
* [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.Eli Friedman2018-08-142-3/+23
| | | | | | | | | | | | | | | | | | | | | Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734
* [ARM] ParallelDSP: add option to enable/disable the passSjoerd Meijer2018-08-141-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D50511 llvm-svn: 339645
* [ARM] Added FP16 VREV Vector Instrinsic CodeGen supportLuke Geeson2018-08-131-0/+2
| | | | llvm-svn: 339546
* Fix unused lambda capture warning from r339472.Eli Friedman2018-08-101-1/+1
| | | | llvm-svn: 339479
* [ARM] Adjust AND immediates to make them cheaper to select.Eli Friedman2018-08-103-0/+85
| | | | | | | | | | | | | | | | | | | | | | | LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472
* [ARM] Disallow zexts in ARMCodeGenPrepareSam Parker2018-08-101-165/+109
| | | | | | | | | | | | | | | | | | | Enabling ARMCodeGenPrepare by default caused a whole load of failures. This is due to zexts and truncs not being handled properly. ZExts are messy so it's just easier to disable for now and truncs are allowed only as 'sinks'. I still need to figure out why allowing them as 'sources' causes so many failures. The other main changes are that we are explicit in the types that we converting to, it's now always 'TypeSize'. Type support is also now performed while checking for valid opcodes as it unnecessarily complicated having the checks are different stages. I've moved the tests around too, so we have the zext and truncs in their own file as well as the overflowing opcode tests. Differential Revision: https://reviews.llvm.org/D50518 llvm-svn: 339432
* [ARM] Adjust the feature set for ExynosEvandro Menezes2018-08-091-0/+36
| | | | | | | | | Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`, `FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`, `FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for all Exynos processors. llvm-svn: 339356
* [ARM] Replace processor check with featureEvandro Menezes2018-08-093-1/+15
| | | | | | | Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a processor check. Otherwise, NFC. llvm-svn: 339354
* [ARM] FP16: codegen support for VTRNSjoerd Meijer2018-08-091-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340
* [ARM] Avoid spilling lr with Thumb1 tail calls.Eli Friedman2018-08-081-2/+7
| | | | | | | | | | | | | | | Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case. The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.) Differential Revision: https://reviews.llvm.org/D49459 llvm-svn: 339283
* [ARM] FP16: codegen support for VEXTSjoerd Meijer2018-08-081-6/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D50427 llvm-svn: 339241
* [ARM] FP16: vector vmov and vdup supportSjoerd Meijer2018-08-081-0/+13
| | | | | | | | This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50329 llvm-svn: 339238
* [ARM] FP16: vector VMUL variantsSjoerd Meijer2018-08-081-2/+14
| | | | | | | | This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50326 llvm-svn: 339232
* [ARM] FP16: support vector INT_TO_FP and FP_TO_INTSjoerd Meijer2018-08-081-7/+35
| | | | | | | | This adds codegen support for the different vcvt_f16 variants. Differential Revision: https://reviews.llvm.org/D50393 llvm-svn: 339227
* [ARM] FP16: support the vector vmin and vmax variantsSjoerd Meijer2018-08-081-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D50238 llvm-svn: 339221
* [ARM] FP16: codegen support for VACGTSjoerd Meijer2018-08-071-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D50236 llvm-svn: 339148
* ARM-MachO: don't add Thumb bit for addend to non-external relocation.Tim Northover2018-08-061-0/+1
| | | | | | | | | ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes out that part of any addend in an object file. But it only does that for symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting that extra bit in other cases. llvm-svn: 339007
* [ARM] FP16: support vector zip and unzipSjoerd Meijer2018-08-031-0/+4
| | | | | | | | This is addressing PR38404. Differential Revision: https://reviews.llvm.org/D50186 llvm-svn: 338835
* [ARM] FP16: support VFMASjoerd Meijer2018-08-031-0/+6
| | | | | | This is addressing PR38404. llvm-svn: 338830
* [GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per ↵Alexander Ivchenko2018-08-022-16/+26
| | | | | | | | | | Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685
* Revert r338354 "[ARM] Revert r337821"Reid Kleckner2018-07-311-1/+1
| | | | | | | | | | | | | | | | | Disable ARMCodeGenPrepare by default again. It is causing verifier failues in V8 that look like: Duplicate integer as switch case switch i32 %trunc, label %if.end13 [ i32 0, label %cleanup36 i32 0, label %if.then8 ], !dbg !4981 i32 0 fatal error: error in backend: Broken function found, compilation aborted! I will continue reducing the test case and send it along. llvm-svn: 338452
* [ARM] Allow automatically deducing the thumb instruction size for .instMartin Storsjo2018-07-311-3/+14
| | | | | | | | This matches GAS, that allows unsuffixed .inst for thumb. Differential Revision: https://reviews.llvm.org/D49937 llvm-svn: 338357
* [ARM] Support the .inst directive for MachO and COFF targetsMartin Storsjo2018-07-312-7/+43
| | | | | | | | | | Contrary to ELF, we don't add any markers that distinguish data generated with .short/.long from normal instructions, so the .inst directive only adds compatibility with assembly that uses it. Differential Revision: https://reviews.llvm.org/D49936 llvm-svn: 338356
* [ARM] Revert r337821Sam Parker2018-07-311-1/+1
| | | | | | | Re-enabling ARMCodeGenPrepare by default after failing to reproduce the bootstrap issues that I was concerned it was causing. llvm-svn: 338354
* Remove trailing spaceFangrui Song2018-07-3019-46/+46
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* Fix uninitialized read in ARM's PrintAsmOperandThomas Preud'homme2018-07-301-2/+3
| | | | | | | | | | | | | | | | | Summary: Fix read of uninitialized RC variable in ARM's PrintAsmOperand when hasRegClassConstraint returns false. This was causing inline-asm-operand-implicit-cast test to fail in r338206. Reviewers: t.p.northover, weimingz, javed.absar, chill Reviewed By: chill Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49984 llvm-svn: 338268
* [ARM] Fix over-alignment in arguments that are HA of 128-bit vectorsPetr Pavlu2018-07-301-5/+6
| | | | | | | | | | | | | | | | | | | | Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible according to AAPCS. Once the first item was laid out, the alignment used for consecutive items was the size of one item. This logic went wrong for 128-bit vectors because their alignment is normally only 64 bits, and so could result in inserting unexpected padding between the first and second element. The patch fixes the problem by updating the alignment with the item size only if this results in reducing it. Differential Revision: https://reviews.llvm.org/D49720 llvm-svn: 338233
* DAG: Add calling convention argument to calling convention funcsMatt Arsenault2018-07-281-1/+1
| | | | | | | | This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194
* [ARM] Add new target feature to fuse literal generationEvandro Menezes2018-07-273-19/+55
| | | | | | | | | | This feature enables the fusion of such operations on Cortex A57 and Cortex A72, as recommended in their Software Optimisation Guides, sections 4.14 and 4.11, respectively. Differential revision: https://reviews.llvm.org/D49563 llvm-svn: 338147
OpenPOWER on IntegriCloud