summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] Sink zext/sext operands for add and sub to enable vsubl generation.Florian Hahn2019-03-062-0/+45
| | | | | | | | | | | | | | | This uses the infrastructure added in rL353152 to sink zext and sexts to sub/add users, to enable vsubl/vaddl generation when NEON is available. See https://bugs.llvm.org/show_bug.cgi?id=40025. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D58063 llvm-svn: 355460
* [ARM] Fix select_cc lowering for fp16Oliver Stannard2019-03-051-7/+11
| | | | | | | | | | | When lowering a select_cc node where the true and false values are of type f16, we can't use a general conditional move because the FP16 instructions do not support conditional execution. Instead, we must ensure that the condition code is one of the four supported by the VSEL instruction. Differential revision: https://reviews.llvm.org/D58813 llvm-svn: 355385
* [ARM] Fix selection of VLDR.16 instruction with imm offsetOliver Stannard2019-03-041-10/+5
| | | | | | | | | | | | The isScaledConstantInRange function takes upper and lower bounds which are checked after dividing by the scale, so the bounds checks for half, single and double precision should all be the same. Previously, we had wrong bounds checks for half precision, so selected an immediate the instructions can't actually represent. Differential revision: https://reviews.llvm.org/D58822 llvm-svn: 355305
* [AArch64/ARM] Fix two compiler warnings in InstructionSelector, NFCIJonas Hahnfeld2019-03-041-5/+6
| | | | | | | | | | | 1) GCC complains that KnownValid is set but not used. 2) In ARMInstructionSelector::selectGlobal() the code is mixing "enumeral and non-enumeral type in conditional expression". Solve this by casting to unsigned which is the final type anyway. Differential Revision: https://reviews.llvm.org/D58834 llvm-svn: 355304
* [ARM] Fix FP16 stack loads/stores for Thumb2 with frame pointerOliver Stannard2019-03-011-2/+2
| | | | | | | | | | The new addressing mode added for the v8.2A FP16 instructions uses bit 8 of the immediate to encode the sign of the offset, like the other FP loads/stores, so need to be treated the same way. Differential revision: https://reviews.llvm.org/D58816 llvm-svn: 355201
* [ARM] Consider undefined-on-NaN conditions in checkVSELConstraintsOliver Stannard2019-03-011-5/+6
| | | | | | | | | | This function was not checking for the condition code variants which are undefined if either input is NaN, so we were missing selection of the VSEL instruction in some cases when using -fno-honor-nans or -ffast-math. Differential revision: https://reviews.llvm.org/D58812 llvm-svn: 355199
* [ARM GlobalISel] Support G_CTLZ for Thumb2Diana Picus2019-03-011-7/+0
| | | | | | Same as ARM mode but with different opcode. llvm-svn: 355191
* [Target][ARM] Add a usage for SrcSz to unbreak build-bots without assertionsKadir Cetinkaya2019-02-281-0/+1
| | | | llvm-svn: 355101
* Add support for computing "zext of value" in KnownBits. NFCIBjorn Pettersson2019-02-281-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The description of KnownBits::zext() and KnownBits::zextOrTrunc() has confusingly been telling that the operation is equivalent to zero extending the value we're tracking. That has not been true, instead the user has been forced to explicitly set the extended bits as known zero afterwards. This patch adds a second argument to KnownBits::zext() and KnownBits::zextOrTrunc() to control if the extended bits should be considered as known zero or as unknown. Reviewers: craig.topper, RKSimon Reviewed By: RKSimon Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58650 llvm-svn: 355099
* [ARM GlobalISel] Make arm_i32imm an IntImmLeafDiana Picus2019-02-282-15/+8
| | | | | | | | | | | | | This gets rid of some duplication in the TableGen definition, but it forces us to keep both a pointer and a reference to the subtarget in the ARMInstructionSelector. That is pretty ugly but it might be a reasonable trade-off, since the TableGen descriptions should outlive the code in the selector (or in the worst case we can update to use just the reference when we get rid of DAGISel). Differential Revision: https://reviews.llvm.org/D58031 llvm-svn: 355083
* [ARM GlobalISel] Support global variables for Thumb2Diana Picus2019-02-282-25/+73
| | | | | | | | | | | | | | | | | | | Add the same level of support as for ARM mode (i.e. still no TLS support). In most cases, it is sufficient to replace the opcodes with the t2-equivalent, but there are some idiosyncrasies that I decided to preserve because I don't understand the full implications: * For ARM we use LDRi12 to load from constant pools, but for Thumb we use t2LDRpci (I'm not sure if the ideal would be to use t2LDRi12 for Thumb as well, or to use LDRcp for ARM). * For Thumb we don't have an equivalent for MOV|LDRLIT_ga_pcrel_ldr, so we have to generate MOV|LDRLIT_ga_pcrel plus a load from GOT. The tests are in separate files because they're hard enough to read even without doubling the number of checks. llvm-svn: 355077
* [llvm-objdump] Implement -Mreg-names-raw/-std options.Igor Kudrin2019-02-263-6/+33
| | | | | | | | | | | | | | The --disassembler-options, or -M, are used to customize the disassembler and affect its output. The two implemented options allow selecting register names on ARM: * With -Mreg-names-raw, the disassembler uses rNN for all registers. * With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15, which is the default behavior of llvm-objdump. Differential Revision: https://reviews.llvm.org/D57680 llvm-svn: 354870
* [ARM] Add Cortex-M35PLuke Cheeseman2019-02-261-0/+10
| | | | | | | | | | - Add LLVM backend support for Cortex-M35P - Documentation can be found at https://developer.arm.com/products/processors/cortex-m/cortex-m35p Differentail Revision: https://reviews.llvm.org/D57763 llvm-svn: 354868
* [ARM] Be super conservative about atomicsPhilip Reames2019-02-261-2/+5
| | | | | | | | | As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that. Differential Revision: https://reviews.llvm.org/D58490 Note: D58498 landed in several pieces as individual backends were approved. This is the last chunk. llvm-svn: 354845
* [ARM] Add some more missing T1 opcodes for the peephole optimisierDavid Green2019-02-251-12/+24
| | | | | | | | | | | This adds a few extra Thumb1 opcodes to improve the peephole opimisers ability to remove redundant cmp instructions. tADC and tSBC require a small fixup to prevent MOVS being moved past the instruction, giving the wrong flags. Differential Revision: https://reviews.llvm.org/D58281 llvm-svn: 354791
* [AArch64] Add support for Cortex-A76 and Cortex-A76AELuke Cheeseman2019-02-253-0/+20
| | | | | | | | - Add LLVM backend support for Cortex-A76 and Cortex-A76AE - Documentation can be found at https://developer.arm.com/products/processors/cortex-a/cortex-a76 llvm-svn: 354788
* [ARM] Make fullfp16 instructions not conditionalisable.Simon Tatham2019-02-254-8/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More or less all the instructions defined in the v8.2a full-fp16 extension are defined as UNPREDICTABLE if you put them in an IT block (Thumb) or use with any condition other than AL (ARM). LLVM didn't know that, and was happy to conditionalise them. In order to force these instructions to count as not predicable, I had to make a small Tablegen change. The code generation back end mostly decides if an instruction was predicable by looking for something it can identify as a predicate operand; there's an isPredicable bit flag that overrides that check in the positive direction, but nothing that overrides it in the negative direction. (I considered the alternative approach of actually removing the predicate operand from those instructions, but thought that it would be more painful overall for instructions differing only in data type to have different shapes of operand list. This way, the only code that has to notice the difference is the if-converter.) So I've added an isUnpredicable bit alongside isPredicable, and set that bit on the right subset of FP16 instructions, and also on the VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be unpredicable for all data types. I've included a couple of representative regression tests, both of which previously caused an fp16 instruction to be conditionalised in ARM state and (with -arm-no-restrict-it) to be put in an IT block in Thumb. Reviewers: SjoerdMeijer, t.p.northover, efriedma Reviewed By: efriedma Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57823 llvm-svn: 354768
* [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPsDavid Green2019-02-221-11/+54
| | | | | | | | | | | | This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri instruction can be used with frame indices. In thumb1 we use tADDframe. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354667
* [ARM GlobalISel] Support floating point for Thumb2Diana Picus2019-02-221-29/+29
| | | | | | | | | | | | | | | This is exactly the same as arm mode, so for the instruction selector tests we just extract them to a new file and run with the same checks for both arm and thumb mode. For the legalizer we need to update the tests for soft float a bit, but only because BL and tBL are slightly different. We could be pedantic and check that we get a well-formed BL for arm mode and a tBL for thumb, but for the purposes of the legalizer test it's sufficient to just skip over the predicate operands in the checks. Also note that we have the pedantic checks in the divmod test, so we're covered. llvm-svn: 354665
* [ARM GlobalISel] Support G_FRAME_INDEX for Thumb2Diana Picus2019-02-212-2/+5
| | | | | | Same as arm mode. llvm-svn: 354579
* Revert 354564: [ARM] Add some missing thumb1 opcodes to enable peephole ↵David Green2019-02-211-54/+12
| | | | | | | | | optimisation of CMPs I believe it's causing bootstrap failures for A32 code. I'll take a look at what's wrong. llvm-svn: 354569
* [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPsDavid Green2019-02-211-12/+54
| | | | | | | | | This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354564
* [ARM] Negative constants mishandled in ARM CGPSam Parker2019-02-211-5/+5
| | | | | | | | | | | | | During type promotion, sometimes we convert negative an add with a negative constant into a sub with a positive constant. The loop that performs this transformation has two issues: - it iterates over a set, causing non-determinism. - it breaks, instead of continuing, when it finds the first non-negative operand. Differential Revision: https://reviews.llvm.org/D58452 llvm-svn: 354557
* [ARM GlobalISel] Support G_PHI for Thumb2Diana Picus2019-02-191-5/+5
| | | | | | Same as arm mode. llvm-svn: 354310
* [ARM GlobalISel] Style fix. NFCIDiana Picus2019-02-151-1/+5
| | | | | | | | Add the opcode for ADDrr / t2ADDrr to the Opcode cache, as we did for all other opcodes where the handling is otherwise the same between arm mode and thumb2. llvm-svn: 354115
* [ARM GlobalISel] Support branches for Thumb2Diana Picus2019-02-152-9/+17
| | | | | | Just like arm mode, but with different opcodes. llvm-svn: 354113
* [ARM CGP] Fix ConvertTruncsSam Parker2019-02-151-8/+17
| | | | | | | | | | | | | ConvertTruncs is used to replace a trunc for an AND mask, however this function wasn't working as expected. By performing the change later, we can create a wide type integer mask instead of a narrow -1 value, which could then be simply removed (incorrectly). Because we now perform this action later, it's necessary to cache the trunc type before we perform the promotion. Differential Revision: https://reviews.llvm.org/D57686 llvm-svn: 354108
* GlobalISel: Add alignment to LegalityQuery MMOsMatt Arsenault2019-02-141-6/+6
| | | | | | | This allows targets to specify the minimum alignment required for the load/store. llvm-svn: 354071
* [ARM] Ensure we update the correct flags in the peephole optimiserDavid Green2019-02-141-2/+5
| | | | | | | | | | | | | | | | The Arm peephole optimiser code keeps track of both an MI and a SubAdd that can be used to optimise away a CMP. In the rare case that both are found and not ruled-out as valid, we could end up setting the flags on the wrong one. Instead make sure we are using SubAdd if it exists, as it will be closer to the CMP. The testcase here is a little theoretical, with a dead def of cpsr. It should hopefully show the point. Differential Revision: https://reviews.llvm.org/D58176 llvm-svn: 354018
* [ARM GlobalISel] Support G_SELECT for Thumb2Diana Picus2019-02-132-5/+13
| | | | | | Same as arm mode, but slightly different opcodes. llvm-svn: 353938
* [ARM] Add v8m.base pattern for add negative immSam Parker2019-02-111-0/+5
| | | | | | | | | | | The v8m.base ISA contains movw, which can operate on an unsigned 16-bit value. Add the pattern that converts an add with a negative value, that could fit into 16-bits when negated, into a sub with that positive value. Differential Revision: https://reviews.llvm.org/D57942 llvm-svn: 353692
* [ARM] LoadStoreOptimizer: reoder limitSjoerd Meijer2019-02-111-1/+6
| | | | | | | | | | | | | | | The whole design of generating LDMs/STMs is fragile and unreliable: it depends on rescheduling here in the LoadStoreOptimizer that isn't register pressure aware and regalloc that isn't aware of generating LDMs/STMs. This patch adds a (hidden) option to control the total number of instructions that can be re-ordered. I appreciate this looks only a tiny bit better than a hard-coded constant, but at least it allows more easy experimentation with different values for now. Ideally we calculate this reorder limit based on some heuristics, and take register pressure into account. I might be looking into that next. Differential Revision: https://reviews.llvm.org/D57954 llvm-svn: 353678
* [ARM] LoadStoreOptimizer: just a clean-up. NFC.Sjoerd Meijer2019-02-111-35/+25
| | | | | | Differential Revision: https://reviews.llvm.org/D57955 llvm-svn: 353670
* Implementation of asm-goto support in LLVMCraig Topper2019-02-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563
* [ARM] Add OptMinSize to ARMSubtargetSam Parker2019-02-0815-47/+60
| | | | | | | | | | | | In many places in the backend, we like to know whether we're optimising for code size and this is performed by checking the current machine function attributes. A subtarget is created on a per-function basis, so it's possible to know when we're compiling for code size on construction so record this in the new object. Differential Revision: https://reviews.llvm.org/D57812 llvm-svn: 353501
* [LSR] Generate cross iteration indexesSam Parker2019-02-071-0/+6
| | | | | | | | | | | | | | Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate a pre-indexed access. This allows the pointer to be updated via the single pre-indexed access so that (hopefully) no add/subs are required to update it for the next iteration. For small cores, this can significantly improve performance DSP-like loops. Differential Revision: https://reviews.llvm.org/D55373 llvm-svn: 353403
* [ARM GlobalISel] Support G_ICMP for Thumb2Diana Picus2019-02-072-12/+25
| | | | | | | Mark as legal and use the t2* equivalents of the arm mode instructions, e.g. t2CMPrr instead of plain CMPrr. llvm-svn: 353392
* [ARM] Reformat isRedundantFlagInstr for D57833. NFCDavid Green2019-02-071-8/+4
| | | | llvm-svn: 353386
* [ARM GlobalISel] Support G_GEP for Thumb2Diana Picus2019-02-052-3/+3
| | | | | | Same as ARM, but use a different opcode in the instruction selection. llvm-svn: 353151
* [AsmPrinter] Remove hidden flag -print-schedule.Andrea Di Biagio2019-02-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043
* [ARM] Mark 255 and 65535 as cheap for Thumb1 "And"David Green2019-02-041-3/+7
| | | | | | | | | | This prevents Constant Hoisting from pulling the constant out of the block, allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap by the default getIntImmCost, but I've added it for clarity. Differential Revision: https://reviews.llvm.org/D57671 llvm-svn: 353040
* [opaque pointer types] Pass value type to GetElementPtr creation.James Y Knight2019-02-011-3/+5
| | | | | | | | | This cleans up all GetElementPtr creation in LLVM to explicitly pass a value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57173 llvm-svn: 352913
* [opaque pointer types] Pass value type to LoadInst creation.James Y Knight2019-02-011-3/+3
| | | | | | | | | This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57172 llvm-svn: 352911
* [ARM] Thumb2: ConstantMaterializationCostSjoerd Meijer2019-01-311-2/+4
| | | | | | | | | | | | | | | | Constants can also be materialised using the negated value and a MVN, and this case seem to have been missed for Thumb2. To check the constant materialisation costs, we now call getT2SOImmVal twice, once for the original constant and then also for its negated value, and this function checks if the constant can both be splatted or rotated. This was revealed by a test that optimises for minsize: instead of a LDR literal pool load and having a literal pool entry, just a MVN with an immediate is smaller (and also faster). Differential Revision: https://reviews.llvm.org/D57327 llvm-svn: 352737
* [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTSSjoerd Meijer2019-01-311-0/+6
| | | | | | | | | | | | | | | | And instead just generate a libcall. My motivating example on ARM was a simple: shl i64 %A, %B for which the code bloat is quite significant. For other targets that also accept __int128/i128 such as AArch64 and X86, it is also beneficial for these cases to generate a libcall when optimising for minsize. On these 64-bit targets, the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS lowering operation action is not set to custom/expand. Differential Revision: https://reviews.llvm.org/D57386 llvm-svn: 352736
* GlobalISel: Allow bitcount ops to have different result typeMatt Arsenault2019-01-311-4/+8
| | | | | | For AMDGPU the result is always 32-bit for 64-bit inputs. llvm-svn: 352717
* GlobalISel: Fix creating MMOs with align 0Matt Arsenault2019-01-311-3/+3
| | | | llvm-svn: 352712
* [ARM] Use sub for negative offset load/store in thumb1David Green2019-01-292-6/+47
| | | | | | | | | | | This attempts to optimise negative values used in load/store operands a little. We currently try to selct them as rr, materialising the negative constant using a MOV/MVN pair. This instead selects ri with an immediate of 0, forcing the add node to become a simpler sub. Differential Revision: https://reviews.llvm.org/D57121 llvm-svn: 352475
* [ARM] Deduplicate table generated CC analysis codeReid Kleckner2019-01-286-275/+324
| | | | | | | Create ARMCallingConv.cpp and emit code for calling convention analysis from there. llvm-svn: 352431
* Remove no longer needed Arm specific LICENSE.TXT file.Arnaud A. de Grandmaison2019-01-281-47/+0
| | | | | | | | | As the codebase is now under the Apache 2.0 license with LLVM Exceptions, and all Arm's contributions, past or future, are under that new license, this Arm specific LICENSE.TXT is no longer needed, thus removing it. llvm-svn: 352376
OpenPOWER on IntegriCloud