summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
* Merging r341642:Hans Wennborg2018-09-105-3/+12
| | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r341642 | tnorthover | 2018-09-07 11:21:25 +0200 (Fri, 07 Sep 2018) | 8 lines ARM: fix Thumb2 CodeGen for ldrex with folded frame-index. Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a proper addressing-mode and tells the rewriter about it so that encodable offsets are exploited and others are rejected. Should fix PR38828. ------------------------------------------------------------------------ llvm-svn: 341783
* Revert r338354 "[ARM] Revert r337821"Reid Kleckner2018-07-311-1/+1
| | | | | | | | | | | | | | | | | Disable ARMCodeGenPrepare by default again. It is causing verifier failues in V8 that look like: Duplicate integer as switch case switch i32 %trunc, label %if.end13 [ i32 0, label %cleanup36 i32 0, label %if.then8 ], !dbg !4981 i32 0 fatal error: error in backend: Broken function found, compilation aborted! I will continue reducing the test case and send it along. llvm-svn: 338452
* [ARM] Allow automatically deducing the thumb instruction size for .instMartin Storsjo2018-07-311-3/+14
| | | | | | | | This matches GAS, that allows unsuffixed .inst for thumb. Differential Revision: https://reviews.llvm.org/D49937 llvm-svn: 338357
* [ARM] Support the .inst directive for MachO and COFF targetsMartin Storsjo2018-07-312-7/+43
| | | | | | | | | | Contrary to ELF, we don't add any markers that distinguish data generated with .short/.long from normal instructions, so the .inst directive only adds compatibility with assembly that uses it. Differential Revision: https://reviews.llvm.org/D49936 llvm-svn: 338356
* [ARM] Revert r337821Sam Parker2018-07-311-1/+1
| | | | | | | Re-enabling ARMCodeGenPrepare by default after failing to reproduce the bootstrap issues that I was concerned it was causing. llvm-svn: 338354
* Remove trailing spaceFangrui Song2018-07-3019-46/+46
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* Fix uninitialized read in ARM's PrintAsmOperandThomas Preud'homme2018-07-301-2/+3
| | | | | | | | | | | | | | | | | Summary: Fix read of uninitialized RC variable in ARM's PrintAsmOperand when hasRegClassConstraint returns false. This was causing inline-asm-operand-implicit-cast test to fail in r338206. Reviewers: t.p.northover, weimingz, javed.absar, chill Reviewed By: chill Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49984 llvm-svn: 338268
* [ARM] Fix over-alignment in arguments that are HA of 128-bit vectorsPetr Pavlu2018-07-301-5/+6
| | | | | | | | | | | | | | | | | | | | Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible according to AAPCS. Once the first item was laid out, the alignment used for consecutive items was the size of one item. This logic went wrong for 128-bit vectors because their alignment is normally only 64 bits, and so could result in inserting unexpected padding between the first and second element. The patch fixes the problem by updating the alignment with the item size only if this results in reducing it. Differential Revision: https://reviews.llvm.org/D49720 llvm-svn: 338233
* DAG: Add calling convention argument to calling convention funcsMatt Arsenault2018-07-281-1/+1
| | | | | | | | This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194
* [ARM] Add new target feature to fuse literal generationEvandro Menezes2018-07-273-19/+55
| | | | | | | | | | This feature enables the fusion of such operations on Cortex A57 and Cortex A72, as recommended in their Software Optimisation Guides, sections 4.14 and 4.11, respectively. Differential revision: https://reviews.llvm.org/D49563 llvm-svn: 338147
* Add missing 'override', fixing compilation with some compilers since SVN r337950Martin Storsjo2018-07-251-1/+1
| | | | llvm-svn: 337952
* [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinterMartin Storsjo2018-07-252-1/+12
| | | | | | | | | | | | | | | | | | | In SVN r334523, the first half of comdat constant pool handling was hoisted from X86WindowsTargetObjectFile (which despite the name only was used for msvc targets) into the arch independent TargetLoweringObjectFileCOFF, but the other half of the handling was left behind in X86AsmPrinter::GetCPISymbol. With only half of the handling in place, inconsistent comdat sections/symbols are created, causing issues with both GNU binutils (avoided for X86 in SVN r335918) and with the MS linker, which would complain like this: fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4 Differential Revision: https://reviews.llvm.org/D49644 llvm-svn: 337950
* [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.Eli Friedman2018-07-251-0/+81
| | | | | | | | | | | | | Saves materializing the immediate for the "ands". Corresponding patterns exist for lsrs+lsls, but that seems less common in practice. Now implemented as a DAGCombine. Differential Revision: https://reviews.llvm.org/D49585 llvm-svn: 337945
* [ARM] Disable ARMCodeGenPrepare by defaultSam Parker2018-07-241-1/+1
| | | | | | | | ARM Stage 2 builders have been suspiciously broken since the pass was committed. Disabling to hopefully fix the bots and give me time to debug. llvm-svn: 337821
* [ARM] Use unique_ptr to fix memory leak introduced in r337701Fangrui Song2018-07-231-11/+9
| | | | llvm-svn: 337714
* OpChain has subclasses, so add a virtual destructor.Jordan Rupprecht2018-07-231-0/+1
| | | | | | | | | | | | | | | Summary: OpChain has subclasses, so add a virtual destructor. This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701. Reviewers: javed.absar Subscribers: llvm-commits, SjoerdMeijer, samparker Differential Revision: https://reviews.llvm.org/D49681 llvm-svn: 337713
* [ARM] Follow-up to r337709.Matt Morehouse2018-07-231-2/+0
| | | | | | Fix double-free. llvm-svn: 337711
* [ARM] Add doFinalization() to ARMCodeGenPrepare pass.Matt Morehouse2018-07-231-0/+6
| | | | | | | Attempt to fix the leak introduced in r337687 and make sanitizer buildbots green again. llvm-svn: 337709
* [ARM][NFC] ParallelDSP reorganisationSam Parker2018-07-231-88/+103
| | | | | | | | | | | | | | | | | In preparing to allow ARMParallelDSP pass to parallelise more than smlads, I've restructed some elements: - The ParallelMAC struct has been renamed to BinOpChain. - The BinOpChain struct holds two value lists: LHS and RHS, as well as inheriting from the OpChain base class. - The OpChain struct holds all the values of the represented chain and has had the memory locations functionality inserted into it. - ParallelMACList becomes OpChainList and it now holds pointers instead of objects. Differential Revision: https://reviews.llvm.org/D49020 llvm-svn: 337701
* [ARM] ARMCodeGenPrepare backend passSam Parker2018-07-234-0/+757
| | | | | | | | | | | | | | | | | | | | | | Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687
* [ARM] Add new feature to enable optimizing the VFP registersEvandro Menezes2018-07-203-2/+15
| | | | | | | | | Enable the optimization of operations on DPR and SPR via a feature instead of checking the target. Differential revision: https://reviews.llvm.org/D49463 llvm-svn: 337575
* ARM: switch armv7em MachO triple to hard-float defaults and libcalls.Tim Northover2018-07-191-0/+2
| | | | | | | | | We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. Recommitted without breaking ELF this time. llvm-svn: 337450
* Revert "ARM: switch armv7em triple to hard-float defaults and libcalls."Tim Northover2018-07-181-1/+0
| | | | | | This reverts commit r337385 until it can be targeted at MachO only. llvm-svn: 337424
* ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.Tim Northover2018-07-181-7/+0
| | | | | | | Since the triple's default is hard float, the libcalls will already use VFP registers. llvm-svn: 337386
* ARM: switch armv7em triple to hard-float defaults and libcalls.Tim Northover2018-07-181-0/+1
| | | | | | | We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. llvm-svn: 337385
* ARM: deduplicate hard-float detection code. NFC.Tim Northover2018-07-184-12/+12
| | | | | | | | ARMSubtarget had a copy/pasted block to determine whether the target was hard-float, but it just delegated to triple features anyway so it's better at the TargetMachine level. llvm-svn: 337384
* [ARM] ParallelDSP: multiple reduction stmts in loopSjoerd Meijer2018-07-111-40/+75
| | | | | | | | | | This fixes an issue that we were not properly supporting multiple reduction stmts in a loop, and not generating SMLADs for these cases. The alias analysis checks were done too early, making it too conservative. Differential revision: https://reviews.llvm.org/D49125 llvm-svn: 336795
* [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.Eli Friedman2018-07-101-4/+6
| | | | | | | | | | | | | The original code attempted to do this, but the std::abs() call didn't actually do anything due to implicit type conversions. Fix the type conversions, and perform the correct check for negative immediates. This probably has very little practical impact, but it's worth fixing just to avoid confusion in the future, I think. Differential Revision: https://reviews.llvm.org/D48907 llvm-svn: 336742
* [ARM] ParallelDSP: added statistics, NFC.Sjoerd Meijer2018-07-061-3/+7
| | | | | | | | | Added statistics for the number of SMLAD instructions created, and als renamed the pass name to -arm-parallel-dsp. Differential Revision: https://reviews.llvm.org/D48971 llvm-svn: 336441
* [AArch64][ARM] Armv8.4-A: Trace synchronization barrier instructionSjoerd Meijer2018-07-066-0/+98
| | | | | | | | This adds the Armv8.4-A Trace synchronization barrier (TSB) instruction. Differential Revision: https://reviews.llvm.org/D48918 llvm-svn: 336418
* [NEON] Fix combining of vldx_dup intrinsics with updating of base addressesIvan A. Kosarev2018-07-051-0/+6
| | | | | | | | | | | | | Resolves: Unsupported ARM Neon intrinsics in Target-specific DAG combine function for VLDDUP https://bugs.llvm.org/show_bug.cgi?id=38031 Related diff: D48439 Differential Revision: https://reviews.llvm.org/D48920 llvm-svn: 336325
* [ARM] ParallelDSP: only support i16 loads for nowSjoerd Meijer2018-07-051-28/+25
| | | | | | | | | We were miscompiling i8 loads, so reject them as unsupported narrow operations for now. Differential Revision: https://reviews.llvm.org/D48944 llvm-svn: 336319
* [ARM] [Assembler] Support negative immediates: cover few missing casesVolodymyr Turanskyy2018-07-042-2/+25
| | | | | | | | | | | | | | | | | | | | | | | | Support for negative immediates was implemented in https://reviews.llvm.org/rL298380, however few instruction options were missing. This change adds negative immediates support and respective tests for the following: ADD ADDS ADDS.W AND.W ANDS BIC.W BICS BICS.W SUB SUBS SUBS.W Differential Revision: https://reviews.llvm.org/D48649 llvm-svn: 336286
* [ARM] Fix inconsistent declaration parameter name in r336195Fangrui Song2018-07-031-1/+1
| | | | llvm-svn: 336223
* [ARM][NFC] Refactor sequential access for DSPSam Parker2018-07-031-18/+27
| | | | | | | | | | With a view to support parallel operations that have their results stored to memory, refactor the consecutive access helper out so it could support stores instructions. Differential Revision: https://reviews.llvm.org/D48872 llvm-svn: 336195
* [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.Vadzim Dambrouski2018-07-021-2/+6
| | | | | | | | | | | | Reviewers: efriedma, rogfer01, javed.absar Reviewed By: efriedma, rogfer01 Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D48846 llvm-svn: 336144
* [UnrollAndJam] New Unroll and Jam passDavid Green2018-07-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062
* [ARM][AArch64] Armv8.4-A EnablementSjoerd Meijer2018-06-294-1/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initial patch adding assembly support for Armv8.4-A. Besides adding v8.4 as a supported architecture to the usual places, this also adds target features for the different crypto algorithms. Armv8.4-A introduced new crypto algorithms, made them optional, and allows different combinations: - none of the v8.4 crypto functions are supported, which is independent of the implementation of the Armv8.0 SHA1 and SHA2 instructions. - the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0 SHA1 and SHA2 instructions must also be implemented. - the v8.4 SM3 and SM4 support is implemented, which is independent of the implementation of the Armv8.0 SHA1 and SHA2 instructions. - all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1 and SHA2 instructions must also be implemented. The v8.4 crypto instructions are added to AArch64 only, and not AArch32, and are made optional extensions to Armv8.2-A. The user-facing Clang options will map on these new target features, their naming will be compatible with GCC and added in follow-up patches. The Armv8.4-A instruction sets can be downloaded here: https://developer.arm.com/products/architecture/a-profile/exploration-tools Differential Revision: https://reviews.llvm.org/D48625 llvm-svn: 335953
* [ARM] Assert that ARMDAGToDAGISel creates valid UBFX/SBFX nodes.Eli Friedman2018-06-281-0/+4
| | | | | | | | | | | | We don't ever check these again (unless you're using -fno-integrated-as), so make sure the extracted bits are well-defined. I don't think it's possible to trigger any of the assertions on trunk, but it's difficult to prove. (The first one depends on DAGCombine to minimize the number of set bits in AND masks; I think the others are mathematically impossible to hit.) llvm-svn: 335931
* [ARM] Add missing Thumb2 assembler diagnostics.Eli Friedman2018-06-281-62/+116
| | | | | | | | | | Mostly just adding checks for Thumb2 instructions which correspond to ARM instructions which already had diagnostics. While I'm here, also fix ARM-mode strd to check the input registers correctly. Differential Revision: https://reviews.llvm.org/D48610 llvm-svn: 335909
* Remove unnecessary semicolon. NFCI.Simon Pilgrim2018-06-281-2/+2
| | | | | | Fixes -Wpedantic warning. llvm-svn: 335901
* SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)Matthias Braun2018-06-281-1/+3
| | | | | | | | | | | | | Add NoTrapAfterNoreturn target option which skips emission of traps behind noreturn calls even if TrapUnreachable is enabled. Enable the feature on Mach-O to save code size; Comments suggest it is not possible to enable it for the other users of TrapUnreachable. rdar://41530228 DifferentialRevision: https://reviews.llvm.org/D48674 llvm-svn: 335877
* [ARM] Parallel DSP PassSjoerd Meijer2018-06-285-1/+624
| | | | | | | | | | | | | | | | | Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of this pass is to do some straightforward IR pattern matching to create ACLE DSP intrinsics, which map on these 32-bit SIMD operations. Currently, only the SMLAD instruction gets recognised. This instruction performs two multiplications with 16-bit operands, and stores the result in an accumulator. We will follow this up with patches to recognise SMLAD in more cases, and also to generate other DSP instructions (like e.g. SADD16). Patch by: Sam Parker and Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D48128 llvm-svn: 335850
* s/TablesChecked/TableChecked/ after r335823Hans Wennborg2018-06-281-1/+1
| | | | llvm-svn: 335831
* Unify sorted asserts to use the existing atomic patternBenjamin Kramer2018-06-281-3/+3
| | | | | | | These are all benign races and only visible in !NDEBUG. tsan complains about it, but a simple atomic bool is sufficient to make it happy. llvm-svn: 335823
* [NEON] Support vldNq intrinsics in AArch32 (LLVM part)Ivan A. Kosarev2018-06-275-70/+262
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733
* [X86,ARM] Retain split-stack prolog check for sibling callsThan McIntosh2018-06-261-2/+4
| | | | | | | | | | | | | | | | | | | Summary: If a routine with no stack frame makes a sibling call, we need to preserve the stack space check even if the local stack frame is empty, since the call target could be a "no-split" function (in which case the linker needs to be able to fix up the prolog sequence in order to switch to a larger stack). This fixes PR37807. Reviewers: cherry, javed.absar Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D48444 llvm-svn: 335604
* ARM: correctly decode VFP instructions following unpredictable t2ITTim Northover2018-06-261-0/+2
| | | | | | | | | When the condition code for an IT instruction is "AL" we get strange "15" predicates on subsequent instructions. These are dealt with for most instructions by treating them as "ARMCC::AL", but VFP takes a different path which didn't have this code. llvm-svn: 335594
* ARM: diagnose unpredictable IT instructionsTim Northover2018-06-262-1/+19
| | | | | | | | | | | IT instructions are allowed to have the 'AL' predicate, but it must never result in an 'NV' predicated instruction. Essentially this means that all branches must be 't' rather than 'e' if the predicate is 'AL'. This patch adds a diagnostic for this during assembly (error because parsing hits an assertion if allowed to continue) and an annotation during disassembly. llvm-svn: 335593
* Recommit of r335326, with the test fixed that I missed.Sjoerd Meijer2018-06-221-3/+6
| | | | llvm-svn: 335331
OpenPOWER on IntegriCloud