summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] TypeSize lower bound for ARMCodeGenPrepareSam Parker2018-08-151-1/+1
| | | | | | | | | We only try to promote types with are smaller than 16-bits, but we also need to check that the type is not less than 8-bits. Differential Revision: https://reviews.llvm.org/D50769 llvm-svn: 339770
* [ARM] Allow signed icmps in ARMCodeGenPrepareSam Parker2018-08-151-22/+44
| | | | | | | | | | | Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339755
* [ARM] Allow pointer values in ARMCodeGenPrepareSam Parker2018-08-151-18/+30
| | | | | | | | | | | | Add pointers to the list of allowed types, but don't try to promote them. Also fixed a bug with the promotion of undef values, so a new value is now created instead of mutating in place. We also now only promote if there's an instruction in the use-def chains other than the icmp, sinks and sources. Differential Revision: https://reviews.llvm.org/D50054 llvm-svn: 339754
* [SDAG] Remove the reliance on MI's allocation strategy forChandler Carruth2018-08-142-35/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be *surprised* at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
* [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.Eli Friedman2018-08-142-3/+23
| | | | | | | | | | | | | | | | | | | | | Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734
* [ARM] ParallelDSP: add option to enable/disable the passSjoerd Meijer2018-08-141-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D50511 llvm-svn: 339645
* [ARM] Added FP16 VREV Vector Instrinsic CodeGen supportLuke Geeson2018-08-131-0/+2
| | | | llvm-svn: 339546
* Fix unused lambda capture warning from r339472.Eli Friedman2018-08-101-1/+1
| | | | llvm-svn: 339479
* [ARM] Adjust AND immediates to make them cheaper to select.Eli Friedman2018-08-103-0/+85
| | | | | | | | | | | | | | | | | | | | | | | LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472
* [ARM] Disallow zexts in ARMCodeGenPrepareSam Parker2018-08-101-165/+109
| | | | | | | | | | | | | | | | | | | Enabling ARMCodeGenPrepare by default caused a whole load of failures. This is due to zexts and truncs not being handled properly. ZExts are messy so it's just easier to disable for now and truncs are allowed only as 'sinks'. I still need to figure out why allowing them as 'sources' causes so many failures. The other main changes are that we are explicit in the types that we converting to, it's now always 'TypeSize'. Type support is also now performed while checking for valid opcodes as it unnecessarily complicated having the checks are different stages. I've moved the tests around too, so we have the zext and truncs in their own file as well as the overflowing opcode tests. Differential Revision: https://reviews.llvm.org/D50518 llvm-svn: 339432
* [ARM] Adjust the feature set for ExynosEvandro Menezes2018-08-091-0/+36
| | | | | | | | | Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`, `FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`, `FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for all Exynos processors. llvm-svn: 339356
* [ARM] Replace processor check with featureEvandro Menezes2018-08-093-1/+15
| | | | | | | Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a processor check. Otherwise, NFC. llvm-svn: 339354
* [ARM] FP16: codegen support for VTRNSjoerd Meijer2018-08-091-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340
* [ARM] Avoid spilling lr with Thumb1 tail calls.Eli Friedman2018-08-081-2/+7
| | | | | | | | | | | | | | | Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case. The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.) Differential Revision: https://reviews.llvm.org/D49459 llvm-svn: 339283
* [ARM] FP16: codegen support for VEXTSjoerd Meijer2018-08-081-6/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D50427 llvm-svn: 339241
* [ARM] FP16: vector vmov and vdup supportSjoerd Meijer2018-08-081-0/+13
| | | | | | | | This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50329 llvm-svn: 339238
* [ARM] FP16: vector VMUL variantsSjoerd Meijer2018-08-081-2/+14
| | | | | | | | This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50326 llvm-svn: 339232
* [ARM] FP16: support vector INT_TO_FP and FP_TO_INTSjoerd Meijer2018-08-081-7/+35
| | | | | | | | This adds codegen support for the different vcvt_f16 variants. Differential Revision: https://reviews.llvm.org/D50393 llvm-svn: 339227
* [ARM] FP16: support the vector vmin and vmax variantsSjoerd Meijer2018-08-081-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D50238 llvm-svn: 339221
* [ARM] FP16: codegen support for VACGTSjoerd Meijer2018-08-071-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D50236 llvm-svn: 339148
* ARM-MachO: don't add Thumb bit for addend to non-external relocation.Tim Northover2018-08-061-0/+1
| | | | | | | | | ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes out that part of any addend in an object file. But it only does that for symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting that extra bit in other cases. llvm-svn: 339007
* [ARM] FP16: support vector zip and unzipSjoerd Meijer2018-08-031-0/+4
| | | | | | | | This is addressing PR38404. Differential Revision: https://reviews.llvm.org/D50186 llvm-svn: 338835
* [ARM] FP16: support VFMASjoerd Meijer2018-08-031-0/+6
| | | | | | This is addressing PR38404. llvm-svn: 338830
* [GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per ↵Alexander Ivchenko2018-08-022-16/+26
| | | | | | | | | | Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685
* Revert r338354 "[ARM] Revert r337821"Reid Kleckner2018-07-311-1/+1
| | | | | | | | | | | | | | | | | Disable ARMCodeGenPrepare by default again. It is causing verifier failues in V8 that look like: Duplicate integer as switch case switch i32 %trunc, label %if.end13 [ i32 0, label %cleanup36 i32 0, label %if.then8 ], !dbg !4981 i32 0 fatal error: error in backend: Broken function found, compilation aborted! I will continue reducing the test case and send it along. llvm-svn: 338452
* [ARM] Allow automatically deducing the thumb instruction size for .instMartin Storsjo2018-07-311-3/+14
| | | | | | | | This matches GAS, that allows unsuffixed .inst for thumb. Differential Revision: https://reviews.llvm.org/D49937 llvm-svn: 338357
* [ARM] Support the .inst directive for MachO and COFF targetsMartin Storsjo2018-07-312-7/+43
| | | | | | | | | | Contrary to ELF, we don't add any markers that distinguish data generated with .short/.long from normal instructions, so the .inst directive only adds compatibility with assembly that uses it. Differential Revision: https://reviews.llvm.org/D49936 llvm-svn: 338356
* [ARM] Revert r337821Sam Parker2018-07-311-1/+1
| | | | | | | Re-enabling ARMCodeGenPrepare by default after failing to reproduce the bootstrap issues that I was concerned it was causing. llvm-svn: 338354
* Remove trailing spaceFangrui Song2018-07-3019-46/+46
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* Fix uninitialized read in ARM's PrintAsmOperandThomas Preud'homme2018-07-301-2/+3
| | | | | | | | | | | | | | | | | Summary: Fix read of uninitialized RC variable in ARM's PrintAsmOperand when hasRegClassConstraint returns false. This was causing inline-asm-operand-implicit-cast test to fail in r338206. Reviewers: t.p.northover, weimingz, javed.absar, chill Reviewed By: chill Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49984 llvm-svn: 338268
* [ARM] Fix over-alignment in arguments that are HA of 128-bit vectorsPetr Pavlu2018-07-301-5/+6
| | | | | | | | | | | | | | | | | | | | Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible according to AAPCS. Once the first item was laid out, the alignment used for consecutive items was the size of one item. This logic went wrong for 128-bit vectors because their alignment is normally only 64 bits, and so could result in inserting unexpected padding between the first and second element. The patch fixes the problem by updating the alignment with the item size only if this results in reducing it. Differential Revision: https://reviews.llvm.org/D49720 llvm-svn: 338233
* DAG: Add calling convention argument to calling convention funcsMatt Arsenault2018-07-281-1/+1
| | | | | | | | This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194
* [ARM] Add new target feature to fuse literal generationEvandro Menezes2018-07-273-19/+55
| | | | | | | | | | This feature enables the fusion of such operations on Cortex A57 and Cortex A72, as recommended in their Software Optimisation Guides, sections 4.14 and 4.11, respectively. Differential revision: https://reviews.llvm.org/D49563 llvm-svn: 338147
* Add missing 'override', fixing compilation with some compilers since SVN r337950Martin Storsjo2018-07-251-1/+1
| | | | llvm-svn: 337952
* [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinterMartin Storsjo2018-07-252-1/+12
| | | | | | | | | | | | | | | | | | | In SVN r334523, the first half of comdat constant pool handling was hoisted from X86WindowsTargetObjectFile (which despite the name only was used for msvc targets) into the arch independent TargetLoweringObjectFileCOFF, but the other half of the handling was left behind in X86AsmPrinter::GetCPISymbol. With only half of the handling in place, inconsistent comdat sections/symbols are created, causing issues with both GNU binutils (avoided for X86 in SVN r335918) and with the MS linker, which would complain like this: fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4 Differential Revision: https://reviews.llvm.org/D49644 llvm-svn: 337950
* [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.Eli Friedman2018-07-251-0/+81
| | | | | | | | | | | | | Saves materializing the immediate for the "ands". Corresponding patterns exist for lsrs+lsls, but that seems less common in practice. Now implemented as a DAGCombine. Differential Revision: https://reviews.llvm.org/D49585 llvm-svn: 337945
* [ARM] Disable ARMCodeGenPrepare by defaultSam Parker2018-07-241-1/+1
| | | | | | | | ARM Stage 2 builders have been suspiciously broken since the pass was committed. Disabling to hopefully fix the bots and give me time to debug. llvm-svn: 337821
* [ARM] Use unique_ptr to fix memory leak introduced in r337701Fangrui Song2018-07-231-11/+9
| | | | llvm-svn: 337714
* OpChain has subclasses, so add a virtual destructor.Jordan Rupprecht2018-07-231-0/+1
| | | | | | | | | | | | | | | Summary: OpChain has subclasses, so add a virtual destructor. This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701. Reviewers: javed.absar Subscribers: llvm-commits, SjoerdMeijer, samparker Differential Revision: https://reviews.llvm.org/D49681 llvm-svn: 337713
* [ARM] Follow-up to r337709.Matt Morehouse2018-07-231-2/+0
| | | | | | Fix double-free. llvm-svn: 337711
* [ARM] Add doFinalization() to ARMCodeGenPrepare pass.Matt Morehouse2018-07-231-0/+6
| | | | | | | Attempt to fix the leak introduced in r337687 and make sanitizer buildbots green again. llvm-svn: 337709
* [ARM][NFC] ParallelDSP reorganisationSam Parker2018-07-231-88/+103
| | | | | | | | | | | | | | | | | In preparing to allow ARMParallelDSP pass to parallelise more than smlads, I've restructed some elements: - The ParallelMAC struct has been renamed to BinOpChain. - The BinOpChain struct holds two value lists: LHS and RHS, as well as inheriting from the OpChain base class. - The OpChain struct holds all the values of the represented chain and has had the memory locations functionality inserted into it. - ParallelMACList becomes OpChainList and it now holds pointers instead of objects. Differential Revision: https://reviews.llvm.org/D49020 llvm-svn: 337701
* [ARM] ARMCodeGenPrepare backend passSam Parker2018-07-234-0/+757
| | | | | | | | | | | | | | | | | | | | | | Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687
* [ARM] Add new feature to enable optimizing the VFP registersEvandro Menezes2018-07-203-2/+15
| | | | | | | | | Enable the optimization of operations on DPR and SPR via a feature instead of checking the target. Differential revision: https://reviews.llvm.org/D49463 llvm-svn: 337575
* ARM: switch armv7em MachO triple to hard-float defaults and libcalls.Tim Northover2018-07-191-0/+2
| | | | | | | | | We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. Recommitted without breaking ELF this time. llvm-svn: 337450
* Revert "ARM: switch armv7em triple to hard-float defaults and libcalls."Tim Northover2018-07-181-1/+0
| | | | | | This reverts commit r337385 until it can be targeted at MachO only. llvm-svn: 337424
* ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.Tim Northover2018-07-181-7/+0
| | | | | | | Since the triple's default is hard float, the libcalls will already use VFP registers. llvm-svn: 337386
* ARM: switch armv7em triple to hard-float defaults and libcalls.Tim Northover2018-07-181-0/+1
| | | | | | | We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. llvm-svn: 337385
* ARM: deduplicate hard-float detection code. NFC.Tim Northover2018-07-184-12/+12
| | | | | | | | ARMSubtarget had a copy/pasted block to determine whether the target was hard-float, but it just delegated to triple features anyway so it's better at the TargetMachine level. llvm-svn: 337384
* [ARM] ParallelDSP: multiple reduction stmts in loopSjoerd Meijer2018-07-111-40/+75
| | | | | | | | | | This fixes an issue that we were not properly supporting multiple reduction stmts in a loop, and not generating SMLADs for these cases. The alias analysis checks were done too early, making it too conservative. Differential revision: https://reviews.llvm.org/D49125 llvm-svn: 336795
OpenPOWER on IntegriCloud