summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "ADT: add <bit> header, implement C++20 bit_cast, use"JF Bastien2018-09-071-8/+13
| | | | | | Bots sad. Looks like missing std::is_trivially_copyable. llvm-svn: 341730
* ADT: add <bit> header, implement C++20 bit_cast, useJF Bastien2018-09-071-13/+8
| | | | | | | | | | | | Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning. Reviewers: javed.absar Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51693 llvm-svn: 341728
* ARM: fix Thumb2 CodeGen for ldrex with folded frame-index.Tim Northover2018-09-075-3/+12
| | | | | | | | | | | Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a proper addressing-mode and tells the rewriter about it so that encodable offsets are exploited and others are rejected. Should fix PR38828. llvm-svn: 341642
* The initial .text section generated in object files was missing theEric Christopher2018-09-062-1/+31
| | | | | | | | | | | | | | | | | | | | SHF_ARM_PURECODE flag when being built with the -mexecute-only flag. All code sections of an ELF must have the flag set for the final .text section to be execute-only, otherwise the flag gets removed. A HasData flag is added to MCSection to aid in the determination that the section is empty. A virtual setTargetSectionFlags is added to MCELFObjectTargetWriter to allow subclasses to set target specific section flags to be added to sections which we then use in the ARM backend to set SHF_ARM_PURECODE. Patch by Ivan Lozano! Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D48792 llvm-svn: 341593
* Remove FrameAccess struct from hasLoadFromStackSlotSander de Smalen2018-09-051-4/+8
| | | | | | | | | | | | | | This removes the FrameAccess struct that was added to the interface in D51537, since the PseudoValue from the MachineMemoryOperand can be safely casted to a FixedStackPseudoSourceValue. Reviewers: MatzeB, thegameg, javed.absar Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51617 llvm-svn: 341454
* [MinGW] Move code for indicating "potentially not DSO local" into ↵Martin Storsjo2018-09-041-3/+2
| | | | | | | | | | | | | | | shouldAssumeDSOLocal. NFC. On Windows, if shouldAssumeDSOLocal returns false, it's either a dllimport reference, or a reference that we should treat as non-local and create a stub for. Clean up AArch64Subtarget::ClassifyGlobalReference a little while touching the flag handling relating to dllimport. Differential Revision: https://reviews.llvm.org/D51590 llvm-svn: 341402
* Add header guards to some headers that are missing themArgyrios Kyrtzidis2018-09-031-0/+5
| | | | | | | Also adjust some of dsymutil's headers to put the header guards at the top, otherwise the compiler will not recognize them as header guards. llvm-svn: 341323
* Extend hasStoreToStackSlot with list of FI accesses.Sander de Smalen2018-09-031-4/+12
| | | | | | | | | | | | | | | | | | For instructions that spill/fill to and from multiple frame-indices in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot should return an array of accesses, rather than just the first encounter of such an access. This better describes FI accesses for AArch64 (paired) LDP/STP instructions. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51537 llvm-svn: 341301
* [MinGW] [ARM] Add stubs for potential automatic dllimported variablesMartin Storsjo2018-08-314-6/+33
| | | | | | | | | | | The runtime pseudo relocations can't handle the ARM format embedded addresses in movw/movt pairs. By using stubs, the potentially dllimported addresses can be touched up by the runtime pseudo relocation framework. Differential Revision: https://reviews.llvm.org/D51450 llvm-svn: 341176
* [ARM] Enable GEP offset splitting for 32-bit ARM.Eli Friedman2018-08-301-0/+2
| | | | | | | | | | | It has essentially the same benefit it has on 64-bit ARM: it substantially reduces the number of constants used by large GEP operations. Seems to be generally helpful across a few different codebases I've tried. Differential Revision: https://reviews.llvm.org/D51462 llvm-svn: 341136
* [ARM] Adjust the feature set for ExynosEvandro Menezes2018-08-301-0/+4
| | | | | | Enable `FeatureUseAA` for all Exynos processors. llvm-svn: 341101
* Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructionsAlexander Ivchenko2018-08-302-6/+9
| | | | | | | | | | | ..Move all target-dependent checks into new isCopyInstrImpl method. This change allows us to treat MoveReg-type instructions and generic COPY instruction in the same way Differential Revision: https://reviews.llvm.org/D49913 llvm-svn: 341072
* [CodeGen] emit inline asm clobber list warnings for reserved (cont)Ties Stuij2018-08-302-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a continuation of https://reviews.llvm.org/D49727 Below the original text, current changes in the comments: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } Compiled for thumb, this gives: $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ Reviewers: efriedma, olista01, javed.absar Reviewed By: efriedma Subscribers: eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51165 llvm-svn: 341062
* Fix "Q" and "R" inline assembly template modifiers for big-endian ArmFlorian Hahn2018-08-301-2/+14
| | | | | | | | | | | | | | Consider the endianness of the target when printing register names. This is in line with the documentation at http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers Patch by Jackson Woodruff <jackson.woodruff@arm.com> Reviewers: t.p.northover, echristo, javed.absar, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D49778 llvm-svn: 341052
* [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.Eli Friedman2018-08-221-1/+3
| | | | | | | | | | The inline sequence is very long (about 70 bytes on Thumb1), so it's not really a good idea to inline it, especially when optimizing for size. Differential Revision: https://reviews.llvm.org/D47917 llvm-svn: 340458
* [ARM] Avoid injecting constant islands in movw+movt pairs on WindowsMartin Storsjo2018-08-221-0/+16
| | | | | | | | | | | | | | | | On Windows, movw+movt pairs with relocations are handled with a single relocation that covers them both. Therefore we can't inject anything between these instructions, otherwise the relocation (which in LLVM only is treated as the movw instruction's relocation, while the movt instruction's relocation is dropped) will end up bogus. These instructions are bundled up until right before the constant islands pass, making this effectively the only place that can split them apart. Differential Revision: https://reviews.llvm.org/D51032 llvm-svn: 340451
* [ARM] Move machine operand target flags to ARMBaseInstrInfoMartin Storsjo2018-08-224-35/+35
| | | | | | | | This makes sure the flags are available for use for thumb MIR as well. A test that requires this will be added in the next commit. llvm-svn: 340450
* [ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant.Eli Friedman2018-08-221-4/+11
| | | | | | | | | | | This avoids a potential infinite loop setting and unsetting bits in the mask. Reduced from a failure on the polly-aosp bot. Differential Revision: https://reviews.llvm.org/D51066 llvm-svn: 340446
* [ARM] Rotated operand patterns for *xtb16Sam Parker2018-08-222-0/+16
| | | | | | | | | Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16 so that they can perform a ror. Differential Revision: https://reviews.llvm.org/D51034 llvm-svn: 340405
* [AArch64] Add Tiny Code Model for AArch64David Green2018-08-222-0/+4
| | | | | | | | | | | | | | This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397
* [ARM/AArch64] Support FP16 +fp16fml instructionsBernard Ogden2018-08-177-3/+95
| | | | | | | | | | | | | | | | | | Add +fp16fml feature for new FP16 instructions, which are a mandatory part of FP16 from v8.4-A and an optional part of FP16 from v8.2-A. It doesn't seem to be possible to model this in LLVM, but the relationship between the options is handled by the related clang patch. In keeping with what I think is the usual practice, the fp16fml extension is accepted regardless of base architecture version. Builds on/replaces Sjoerd Meijer's patch to add these instructions at https://reviews.llvm.org/D49839. Differential Revision: https://reviews.llvm.org/D50228 llvm-svn: 340013
* [ARM][NFC] ARMCodeGenPrepare: some refactoring and algorithm descriptionSjoerd Meijer2018-08-171-33/+85
| | | | | | Differential Revision: https://reviews.llvm.org/D50846 llvm-svn: 339997
* [MI] Change the array of `MachineMemOperand` pointers to beChandler Carruth2018-08-165-27/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940
* [ARM] Ignore GEPs in ARMCodeGenPrepareSam Parker2018-08-161-0/+5
| | | | | | | | | | | While searching through the use-def tree, ignore GetElementPtrInst instructions because they don't need promoting and neither do their indices. Otherwise, the wide indices prevent the transformation from happening. Differential Revision: https://reviews.llvm.org/D50762 llvm-svn: 339871
* [ARM] Allow zext in ARMCodeGenPrepareSam Parker2018-08-161-3/+8
| | | | | | | | Treat zext instructions as roots, like we do for truncs. Differential Revision: https://reviews.llvm.org/D50759 llvm-svn: 339868
* [ARM] Allow signed icmps in ARMCodeGenPrepareSam Parker2018-08-161-30/+45
| | | | | | | | | | | | | | | | | Originally committed in r339755 which was reverted in r339806 due to an asan issue. The issue was caused by my assumption that operands to a CallInst mapped to the FunctionType Params. CallInsts are now handled by iterating over their ArgOperands instead of Operands. Original Message: Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339858
* Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare"Vitaly Buka2018-08-151-44/+22
| | | | | | | | use-after-poison in check-llvm under asan This reverts commit r339755. llvm-svn: 339806
* [ARM] TypeSize lower bound for ARMCodeGenPrepareSam Parker2018-08-151-1/+1
| | | | | | | | | We only try to promote types with are smaller than 16-bits, but we also need to check that the type is not less than 8-bits. Differential Revision: https://reviews.llvm.org/D50769 llvm-svn: 339770
* [ARM] Allow signed icmps in ARMCodeGenPrepareSam Parker2018-08-151-22/+44
| | | | | | | | | | | Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339755
* [ARM] Allow pointer values in ARMCodeGenPrepareSam Parker2018-08-151-18/+30
| | | | | | | | | | | | Add pointers to the list of allowed types, but don't try to promote them. Also fixed a bug with the promotion of undef values, so a new value is now created instead of mutating in place. We also now only promote if there's an instruction in the use-def chains other than the icmp, sinks and sources. Differential Revision: https://reviews.llvm.org/D50054 llvm-svn: 339754
* [SDAG] Remove the reliance on MI's allocation strategy forChandler Carruth2018-08-142-35/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be *surprised* at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
* [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.Eli Friedman2018-08-142-3/+23
| | | | | | | | | | | | | | | | | | | | | Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734
* [ARM] ParallelDSP: add option to enable/disable the passSjoerd Meijer2018-08-141-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D50511 llvm-svn: 339645
* [ARM] Added FP16 VREV Vector Instrinsic CodeGen supportLuke Geeson2018-08-131-0/+2
| | | | llvm-svn: 339546
* Fix unused lambda capture warning from r339472.Eli Friedman2018-08-101-1/+1
| | | | llvm-svn: 339479
* [ARM] Adjust AND immediates to make them cheaper to select.Eli Friedman2018-08-103-0/+85
| | | | | | | | | | | | | | | | | | | | | | | LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472
* [ARM] Disallow zexts in ARMCodeGenPrepareSam Parker2018-08-101-165/+109
| | | | | | | | | | | | | | | | | | | Enabling ARMCodeGenPrepare by default caused a whole load of failures. This is due to zexts and truncs not being handled properly. ZExts are messy so it's just easier to disable for now and truncs are allowed only as 'sinks'. I still need to figure out why allowing them as 'sources' causes so many failures. The other main changes are that we are explicit in the types that we converting to, it's now always 'TypeSize'. Type support is also now performed while checking for valid opcodes as it unnecessarily complicated having the checks are different stages. I've moved the tests around too, so we have the zext and truncs in their own file as well as the overflowing opcode tests. Differential Revision: https://reviews.llvm.org/D50518 llvm-svn: 339432
* [ARM] Adjust the feature set for ExynosEvandro Menezes2018-08-091-0/+36
| | | | | | | | | Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`, `FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`, `FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for all Exynos processors. llvm-svn: 339356
* [ARM] Replace processor check with featureEvandro Menezes2018-08-093-1/+15
| | | | | | | Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a processor check. Otherwise, NFC. llvm-svn: 339354
* [ARM] FP16: codegen support for VTRNSjoerd Meijer2018-08-091-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340
* [ARM] Avoid spilling lr with Thumb1 tail calls.Eli Friedman2018-08-081-2/+7
| | | | | | | | | | | | | | | Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case. The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.) Differential Revision: https://reviews.llvm.org/D49459 llvm-svn: 339283
* [ARM] FP16: codegen support for VEXTSjoerd Meijer2018-08-081-6/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D50427 llvm-svn: 339241
* [ARM] FP16: vector vmov and vdup supportSjoerd Meijer2018-08-081-0/+13
| | | | | | | | This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50329 llvm-svn: 339238
* [ARM] FP16: vector VMUL variantsSjoerd Meijer2018-08-081-2/+14
| | | | | | | | This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50326 llvm-svn: 339232
* [ARM] FP16: support vector INT_TO_FP and FP_TO_INTSjoerd Meijer2018-08-081-7/+35
| | | | | | | | This adds codegen support for the different vcvt_f16 variants. Differential Revision: https://reviews.llvm.org/D50393 llvm-svn: 339227
* [ARM] FP16: support the vector vmin and vmax variantsSjoerd Meijer2018-08-081-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D50238 llvm-svn: 339221
* [ARM] FP16: codegen support for VACGTSjoerd Meijer2018-08-071-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D50236 llvm-svn: 339148
* ARM-MachO: don't add Thumb bit for addend to non-external relocation.Tim Northover2018-08-061-0/+1
| | | | | | | | | ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes out that part of any addend in an object file. But it only does that for symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting that extra bit in other cases. llvm-svn: 339007
* [ARM] FP16: support vector zip and unzipSjoerd Meijer2018-08-031-0/+4
| | | | | | | | This is addressing PR38404. Differential Revision: https://reviews.llvm.org/D50186 llvm-svn: 338835
* [ARM] FP16: support VFMASjoerd Meijer2018-08-031-0/+6
| | | | | | This is addressing PR38404. llvm-svn: 338830
OpenPOWER on IntegriCloud