summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [Thumb1] Any imm8 should have cost of 1Zhaoshi Zheng2018-09-241-2/+2
| | | | | | | | | A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or [0, 255] in unsigned i8 type on Thumb1. Differential Revision: https://reviews.llvm.org/D52257 llvm-svn: 342898
* [Arm][AsmParser] Restrict register list size for VSTM/VLDMLuke Cheeseman2018-09-241-0/+9
| | | | | | | | | | - The assembler accepts VSTM/VLDM with register lists (specifically double registers lists) with more than 16 registers specified - The Arm architecture reference manual says this instruction must not contain more than 16 registers when the registers are doubleword registers - This addresses one of the concerns in https://bugs.llvm.org/show_bug.cgi?id=38389 Differential Revision: https://reviews.llvm.org/D52082 llvm-svn: 342891
* [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33Sjoerd Meijer2018-09-242-3/+5
| | | | | | | | | | | | A sequence of VMUL and VADD instructions always give the same or better performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33. Executing the VMUL and VADD back-to-back requires the same cycles, but having separate instructions allows scheduling to avoid the hazard between these 2 instructions. Differential Revision: https://reviews.llvm.org/D52289 llvm-svn: 342874
* Revert r341932 "[ARM] Enable ARMCodeGenPrepare by default"Hans Wennborg2018-09-241-1/+1
| | | | | | | | | | | This caused miscompilation of WebRTC for Android: PR39060. > We've had the pass enabled downstream for a couple of weeks and it > seems to be okay, so enable it by default. > > Differential Revision: https://reviews.llvm.org/D51920 llvm-svn: 342873
* [ARM][ARMLoadStoreOptimizer]Luke Cheeseman2018-09-241-0/+14
| | | | | | | | | | | - The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers - This is an UNPREDICTABLE instruction and shouldn't be done - It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch - This fixes https://bugs.llvm.org/show_bug.cgi?id=38389 Differential Revision: https://reviews.llvm.org/D52085 llvm-svn: 342872
* [ARM] bottom-top mul support ARMParallelDSPSam Parker2018-09-241-27/+154
| | | | | | | | | | | | | | | | Originally committed in rL342210 but was reverted in rL342260 because it was causing issues in vectorized code, because I had forgotten to ensure that we're operating on scalar values. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342870
* Fix for bug 34002 - label generated before it block is finalized. ↵Maya Madhavan2018-09-201-1/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D52258 llvm-svn: 342615
* [ARM] Adjust the feature set for ExynosEvandro Menezes2018-09-191-0/+6
| | | | | | Fine tune the cost model for all Exynos processors. llvm-svn: 342585
* [ARM] Refactor Exynos feature set (NFC)Evandro Menezes2018-09-193-71/+23
| | | | | | | Since all Exynos processors share the same feature set, fold them in the implied fatures list for the subtarget. llvm-svn: 342583
* [AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IRAlex Bradbury2018-09-192-5/+8
| | | | | | | | | | | | | | | | | This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they will see no functional change. Previously this hook returned bool, but it now returns AtomicExpansionKind. This hook allows targets to select how a given cmpxchg is to be expanded. D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic. See my associated RFC for more info on the motivation for this change <http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>. Differential Revision: https://reviews.llvm.org/D48130 llvm-svn: 342550
* [ARM] Fix unwind information for floating point registersOliver Stannard2018-09-191-3/+7
| | | | | | | | | | | | Fixes the unwind information generated for floating-point registers. Previously, all padding registers were assumed to be four bytes wide. Now, the width of the register is used to specify the amount of padding. Patch by Jackson Woodruff! Differential revision: https://reviews.llvm.org/D51494 llvm-svn: 342545
* Revert "[ARM] Cleanup ARM CGP isSupportedValue"Volodymyr Sapsai2018-09-181-19/+42
| | | | | | | | | | | | | | | This reverts r342395 as it caused error > Argument value type does not match pointer operand type! > %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25 > i8in function atomic_flag_test_and_set > fatal error: error in backend: Broken function found, compilation aborted! on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/ More details are available at https://reviews.llvm.org/D52080 llvm-svn: 342431
* [ARM] Cleanup ARM CGP isSupportedValueSam Parker2018-09-171-42/+19
| | | | | | | | | | | | isSupportedValue explicitly checked and accepted many types of value, primarily for debugging reasons. Remove most of these checks and do a bit of refactoring now that the pass is more stable. This also enables ZExts to be sources, but this has very little practical benefit at the moment extend instructions will still be introduced. Differential Revision: https://reviews.llvm.org/D52080 llvm-svn: 342395
* [ARM] Disallow icmp with negative imm and overflowSam Parker2018-09-171-0/+11
| | | | | | | | | | We allow overflowing instructions if they're decreasing and only used by an unsigned compare. Add the extra condition that the icmp cannot be using a negative immediate. Differential Revision: https://reviews.llvm.org/D52102 llvm-svn: 342392
* Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP"Reid Kleckner2018-09-141-152/+27
| | | | | | | | | | It causes assertion failures while building Skia for Android in Chromium: https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550 Reduction forthcoming. llvm-svn: 342260
* [ARM] bottom-top mul support in ARMParallelDSPSam Parker2018-09-141-27/+152
| | | | | | | | | | On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342210
* [ARM] Allow truncs as sources in ARM CGPSam Parker2018-09-131-19/+23
| | | | | | | | | | We previously only allowed truncs as sinks, but now allow them as sources too. We do this by checking that the result type is the narrow type that we're trying to optimise for. Differential Revision: https://reviews.llvm.org/D51978 llvm-svn: 342141
* [ARM] Fix FixConst for ARMCodeGenPrepareSam Parker2018-09-131-20/+3
| | | | | | | | | | Part of FixConsts wrongly assumes either a 8- or 16-bit constant which can result in the wrong constants being generated during promotion. Differential Revision: https://reviews.llvm.org/D52032 llvm-svn: 342140
* ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.Tim Northover2018-09-134-0/+22
| | | | | | | | | | | | The Technical Reference Manuals for these two CPUs state that branching to an unaligned 32-bit instruction incurs an extra pipeline reload penalty. That's bad. This also enables the optimization at -Os since it costs on average one byte per loop in return for 1 cycle per iteration, which is pretty good going. llvm-svn: 342127
* ARM: correct the relocation type for `bl` on WoASaleem Abdulrasool2018-09-131-1/+1
| | | | | | | | | | The `IMAGE_REL_ARM_BRANCH20T` applies only to a `b.w` instruction. A thumb-2 `bl` should be relocated using a `IMAGE_REL_ARM_BRANCH24T`. Correct the relocation that we emit in such a case. Resolves PR38620! Based on the patch by Jordan Rhee! llvm-svn: 342109
* [ARM] Tighten f64<->f16 conversion requirementsDiogo N. Sampaio2018-09-121-4/+8
| | | | | | | | | | | | | | Fix missing Requires fields. Patch by Bernard Ogden (bogden) Reviewers: SjoerdMeijer, javed.absar, t.p.northover Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D51631 llvm-svn: 342061
* [ARM] Follow-up to rL342033Sam Parker2018-09-121-1/+1
| | | | | | Fixed typo which can cause segfault. llvm-svn: 342040
* [ARM] Exchange MAC operands in ARMParallelDSPSam Parker2018-09-121-115/+154
| | | | | | | | | | | | | | | | SMLAD and SMLALD instructions also come in the form of SMLADX and SMLALDX which perform an exchange on their second operand. To support this, more of the loads in the MAC candidates are compared for sequential access and a boolean value has been added to BinOpChain. AddMACCandiate has been refactored into a small pattern matching state machine to reduce the amount of duplicated code, but also to enable the matching to be more flexible. CreateParallelMACPairs now iterates through all the candidates to find parallel ones. Differential Revision: https://reviews.llvm.org/D51424 llvm-svn: 342033
* [ARM] Allow bitcasts in ARMCodeGenPrepareSam Parker2018-09-121-5/+4
| | | | | | | | Allow bitcasts in the use-def chains, treating them as sources. Differential Revision: https://reviews.llvm.org/D50758 llvm-svn: 342032
* [ARM] Add smlald support in ARMParallelDSPSam Parker2018-09-111-13/+41
| | | | | | | | | Search from i64 reducing phis, as well as i32, to allow the generation of smlald instructions. Differential Revision: https://reviews.llvm.org/D51101 llvm-svn: 341941
* [ARM] Enable ARMCodeGenPrepare by defaultSam Parker2018-09-111-1/+1
| | | | | | | | | We've had the pass enabled downstream for a couple of weeks and it seems to be okay, so enable it by default. Differential Revision: https://reviews.llvm.org/D51920 llvm-svn: 341932
* [Target] Untangle disassemblersBenjamin Kramer2018-09-101-1/+1
| | | | | | | Disassemblers cannot depend on main target headers. The same is true for MCTargetDesc, but there's a lot more cleanup needed for that. llvm-svn: 341822
* Fix typo in previous commitJF Bastien2018-09-081-1/+1
| | | | llvm-svn: 341742
* ADT: add <bit> header, implement C++20 bit_cast, useJF Bastien2018-09-081-13/+9
| | | | | | | | | | | | | | Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning. This was originally committed as r341728 and reverted in r341730. Reviewers: javed.absar, steven_wu, srhines Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51693 llvm-svn: 341741
* Revert "ADT: add <bit> header, implement C++20 bit_cast, use"JF Bastien2018-09-071-8/+13
| | | | | | Bots sad. Looks like missing std::is_trivially_copyable. llvm-svn: 341730
* ADT: add <bit> header, implement C++20 bit_cast, useJF Bastien2018-09-071-13/+8
| | | | | | | | | | | | Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning. Reviewers: javed.absar Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51693 llvm-svn: 341728
* ARM: fix Thumb2 CodeGen for ldrex with folded frame-index.Tim Northover2018-09-075-3/+12
| | | | | | | | | | | Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a proper addressing-mode and tells the rewriter about it so that encodable offsets are exploited and others are rejected. Should fix PR38828. llvm-svn: 341642
* The initial .text section generated in object files was missing theEric Christopher2018-09-062-1/+31
| | | | | | | | | | | | | | | | | | | | SHF_ARM_PURECODE flag when being built with the -mexecute-only flag. All code sections of an ELF must have the flag set for the final .text section to be execute-only, otherwise the flag gets removed. A HasData flag is added to MCSection to aid in the determination that the section is empty. A virtual setTargetSectionFlags is added to MCELFObjectTargetWriter to allow subclasses to set target specific section flags to be added to sections which we then use in the ARM backend to set SHF_ARM_PURECODE. Patch by Ivan Lozano! Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D48792 llvm-svn: 341593
* Remove FrameAccess struct from hasLoadFromStackSlotSander de Smalen2018-09-051-4/+8
| | | | | | | | | | | | | | This removes the FrameAccess struct that was added to the interface in D51537, since the PseudoValue from the MachineMemoryOperand can be safely casted to a FixedStackPseudoSourceValue. Reviewers: MatzeB, thegameg, javed.absar Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51617 llvm-svn: 341454
* [MinGW] Move code for indicating "potentially not DSO local" into ↵Martin Storsjo2018-09-041-3/+2
| | | | | | | | | | | | | | | shouldAssumeDSOLocal. NFC. On Windows, if shouldAssumeDSOLocal returns false, it's either a dllimport reference, or a reference that we should treat as non-local and create a stub for. Clean up AArch64Subtarget::ClassifyGlobalReference a little while touching the flag handling relating to dllimport. Differential Revision: https://reviews.llvm.org/D51590 llvm-svn: 341402
* Add header guards to some headers that are missing themArgyrios Kyrtzidis2018-09-031-0/+5
| | | | | | | Also adjust some of dsymutil's headers to put the header guards at the top, otherwise the compiler will not recognize them as header guards. llvm-svn: 341323
* Extend hasStoreToStackSlot with list of FI accesses.Sander de Smalen2018-09-031-4/+12
| | | | | | | | | | | | | | | | | | For instructions that spill/fill to and from multiple frame-indices in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot should return an array of accesses, rather than just the first encounter of such an access. This better describes FI accesses for AArch64 (paired) LDP/STP instructions. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51537 llvm-svn: 341301
* [MinGW] [ARM] Add stubs for potential automatic dllimported variablesMartin Storsjo2018-08-314-6/+33
| | | | | | | | | | | The runtime pseudo relocations can't handle the ARM format embedded addresses in movw/movt pairs. By using stubs, the potentially dllimported addresses can be touched up by the runtime pseudo relocation framework. Differential Revision: https://reviews.llvm.org/D51450 llvm-svn: 341176
* [ARM] Enable GEP offset splitting for 32-bit ARM.Eli Friedman2018-08-301-0/+2
| | | | | | | | | | | It has essentially the same benefit it has on 64-bit ARM: it substantially reduces the number of constants used by large GEP operations. Seems to be generally helpful across a few different codebases I've tried. Differential Revision: https://reviews.llvm.org/D51462 llvm-svn: 341136
* [ARM] Adjust the feature set for ExynosEvandro Menezes2018-08-301-0/+4
| | | | | | Enable `FeatureUseAA` for all Exynos processors. llvm-svn: 341101
* Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructionsAlexander Ivchenko2018-08-302-6/+9
| | | | | | | | | | | ..Move all target-dependent checks into new isCopyInstrImpl method. This change allows us to treat MoveReg-type instructions and generic COPY instruction in the same way Differential Revision: https://reviews.llvm.org/D49913 llvm-svn: 341072
* [CodeGen] emit inline asm clobber list warnings for reserved (cont)Ties Stuij2018-08-302-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a continuation of https://reviews.llvm.org/D49727 Below the original text, current changes in the comments: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } Compiled for thumb, this gives: $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ Reviewers: efriedma, olista01, javed.absar Reviewed By: efriedma Subscribers: eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51165 llvm-svn: 341062
* Fix "Q" and "R" inline assembly template modifiers for big-endian ArmFlorian Hahn2018-08-301-2/+14
| | | | | | | | | | | | | | Consider the endianness of the target when printing register names. This is in line with the documentation at http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers Patch by Jackson Woodruff <jackson.woodruff@arm.com> Reviewers: t.p.northover, echristo, javed.absar, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D49778 llvm-svn: 341052
* [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.Eli Friedman2018-08-221-1/+3
| | | | | | | | | | The inline sequence is very long (about 70 bytes on Thumb1), so it's not really a good idea to inline it, especially when optimizing for size. Differential Revision: https://reviews.llvm.org/D47917 llvm-svn: 340458
* [ARM] Avoid injecting constant islands in movw+movt pairs on WindowsMartin Storsjo2018-08-221-0/+16
| | | | | | | | | | | | | | | | On Windows, movw+movt pairs with relocations are handled with a single relocation that covers them both. Therefore we can't inject anything between these instructions, otherwise the relocation (which in LLVM only is treated as the movw instruction's relocation, while the movt instruction's relocation is dropped) will end up bogus. These instructions are bundled up until right before the constant islands pass, making this effectively the only place that can split them apart. Differential Revision: https://reviews.llvm.org/D51032 llvm-svn: 340451
* [ARM] Move machine operand target flags to ARMBaseInstrInfoMartin Storsjo2018-08-224-35/+35
| | | | | | | | This makes sure the flags are available for use for thumb MIR as well. A test that requires this will be added in the next commit. llvm-svn: 340450
* [ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant.Eli Friedman2018-08-221-4/+11
| | | | | | | | | | | This avoids a potential infinite loop setting and unsetting bits in the mask. Reduced from a failure on the polly-aosp bot. Differential Revision: https://reviews.llvm.org/D51066 llvm-svn: 340446
* [ARM] Rotated operand patterns for *xtb16Sam Parker2018-08-222-0/+16
| | | | | | | | | Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16 so that they can perform a ror. Differential Revision: https://reviews.llvm.org/D51034 llvm-svn: 340405
* [AArch64] Add Tiny Code Model for AArch64David Green2018-08-222-0/+4
| | | | | | | | | | | | | | This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397
* [ARM/AArch64] Support FP16 +fp16fml instructionsBernard Ogden2018-08-177-3/+95
| | | | | | | | | | | | | | | | | | Add +fp16fml feature for new FP16 instructions, which are a mandatory part of FP16 from v8.4-A and an optional part of FP16 from v8.2-A. It doesn't seem to be possible to model this in LLVM, but the relationship between the options is handled by the related clang patch. In keeping with what I think is the usual practice, the fp16fml extension is accepted regardless of base architecture version. Builds on/replaces Sjoerd Meijer's patch to add these instructions at https://reviews.llvm.org/D49839. Differential Revision: https://reviews.llvm.org/D50228 llvm-svn: 340013
OpenPOWER on IntegriCloud