summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] Be super conservative about atomicsPhilip Reames2019-02-261-2/+5
| | | | | | | | | As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that. Differential Revision: https://reviews.llvm.org/D58490 Note: D58498 landed in several pieces as individual backends were approved. This is the last chunk. llvm-svn: 354845
* [ARM] Add some more missing T1 opcodes for the peephole optimisierDavid Green2019-02-251-12/+24
| | | | | | | | | | | This adds a few extra Thumb1 opcodes to improve the peephole opimisers ability to remove redundant cmp instructions. tADC and tSBC require a small fixup to prevent MOVS being moved past the instruction, giving the wrong flags. Differential Revision: https://reviews.llvm.org/D58281 llvm-svn: 354791
* [AArch64] Add support for Cortex-A76 and Cortex-A76AELuke Cheeseman2019-02-253-0/+20
| | | | | | | | - Add LLVM backend support for Cortex-A76 and Cortex-A76AE - Documentation can be found at https://developer.arm.com/products/processors/cortex-a/cortex-a76 llvm-svn: 354788
* [ARM] Make fullfp16 instructions not conditionalisable.Simon Tatham2019-02-254-8/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More or less all the instructions defined in the v8.2a full-fp16 extension are defined as UNPREDICTABLE if you put them in an IT block (Thumb) or use with any condition other than AL (ARM). LLVM didn't know that, and was happy to conditionalise them. In order to force these instructions to count as not predicable, I had to make a small Tablegen change. The code generation back end mostly decides if an instruction was predicable by looking for something it can identify as a predicate operand; there's an isPredicable bit flag that overrides that check in the positive direction, but nothing that overrides it in the negative direction. (I considered the alternative approach of actually removing the predicate operand from those instructions, but thought that it would be more painful overall for instructions differing only in data type to have different shapes of operand list. This way, the only code that has to notice the difference is the if-converter.) So I've added an isUnpredicable bit alongside isPredicable, and set that bit on the right subset of FP16 instructions, and also on the VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be unpredicable for all data types. I've included a couple of representative regression tests, both of which previously caused an fp16 instruction to be conditionalised in ARM state and (with -arm-no-restrict-it) to be put in an IT block in Thumb. Reviewers: SjoerdMeijer, t.p.northover, efriedma Reviewed By: efriedma Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57823 llvm-svn: 354768
* [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPsDavid Green2019-02-221-11/+54
| | | | | | | | | | | | This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri instruction can be used with frame indices. In thumb1 we use tADDframe. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354667
* [ARM GlobalISel] Support floating point for Thumb2Diana Picus2019-02-221-29/+29
| | | | | | | | | | | | | | | This is exactly the same as arm mode, so for the instruction selector tests we just extract them to a new file and run with the same checks for both arm and thumb mode. For the legalizer we need to update the tests for soft float a bit, but only because BL and tBL are slightly different. We could be pedantic and check that we get a well-formed BL for arm mode and a tBL for thumb, but for the purposes of the legalizer test it's sufficient to just skip over the predicate operands in the checks. Also note that we have the pedantic checks in the divmod test, so we're covered. llvm-svn: 354665
* [ARM GlobalISel] Support G_FRAME_INDEX for Thumb2Diana Picus2019-02-212-2/+5
| | | | | | Same as arm mode. llvm-svn: 354579
* Revert 354564: [ARM] Add some missing thumb1 opcodes to enable peephole ↵David Green2019-02-211-54/+12
| | | | | | | | | optimisation of CMPs I believe it's causing bootstrap failures for A32 code. I'll take a look at what's wrong. llvm-svn: 354569
* [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPsDavid Green2019-02-211-12/+54
| | | | | | | | | This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354564
* [ARM] Negative constants mishandled in ARM CGPSam Parker2019-02-211-5/+5
| | | | | | | | | | | | | During type promotion, sometimes we convert negative an add with a negative constant into a sub with a positive constant. The loop that performs this transformation has two issues: - it iterates over a set, causing non-determinism. - it breaks, instead of continuing, when it finds the first non-negative operand. Differential Revision: https://reviews.llvm.org/D58452 llvm-svn: 354557
* [ARM GlobalISel] Support G_PHI for Thumb2Diana Picus2019-02-191-5/+5
| | | | | | Same as arm mode. llvm-svn: 354310
* [ARM GlobalISel] Style fix. NFCIDiana Picus2019-02-151-1/+5
| | | | | | | | Add the opcode for ADDrr / t2ADDrr to the Opcode cache, as we did for all other opcodes where the handling is otherwise the same between arm mode and thumb2. llvm-svn: 354115
* [ARM GlobalISel] Support branches for Thumb2Diana Picus2019-02-152-9/+17
| | | | | | Just like arm mode, but with different opcodes. llvm-svn: 354113
* [ARM CGP] Fix ConvertTruncsSam Parker2019-02-151-8/+17
| | | | | | | | | | | | | ConvertTruncs is used to replace a trunc for an AND mask, however this function wasn't working as expected. By performing the change later, we can create a wide type integer mask instead of a narrow -1 value, which could then be simply removed (incorrectly). Because we now perform this action later, it's necessary to cache the trunc type before we perform the promotion. Differential Revision: https://reviews.llvm.org/D57686 llvm-svn: 354108
* GlobalISel: Add alignment to LegalityQuery MMOsMatt Arsenault2019-02-141-6/+6
| | | | | | | This allows targets to specify the minimum alignment required for the load/store. llvm-svn: 354071
* [ARM] Ensure we update the correct flags in the peephole optimiserDavid Green2019-02-141-2/+5
| | | | | | | | | | | | | | | | The Arm peephole optimiser code keeps track of both an MI and a SubAdd that can be used to optimise away a CMP. In the rare case that both are found and not ruled-out as valid, we could end up setting the flags on the wrong one. Instead make sure we are using SubAdd if it exists, as it will be closer to the CMP. The testcase here is a little theoretical, with a dead def of cpsr. It should hopefully show the point. Differential Revision: https://reviews.llvm.org/D58176 llvm-svn: 354018
* [ARM GlobalISel] Support G_SELECT for Thumb2Diana Picus2019-02-132-5/+13
| | | | | | Same as arm mode, but slightly different opcodes. llvm-svn: 353938
* [ARM] Add v8m.base pattern for add negative immSam Parker2019-02-111-0/+5
| | | | | | | | | | | The v8m.base ISA contains movw, which can operate on an unsigned 16-bit value. Add the pattern that converts an add with a negative value, that could fit into 16-bits when negated, into a sub with that positive value. Differential Revision: https://reviews.llvm.org/D57942 llvm-svn: 353692
* [ARM] LoadStoreOptimizer: reoder limitSjoerd Meijer2019-02-111-1/+6
| | | | | | | | | | | | | | | The whole design of generating LDMs/STMs is fragile and unreliable: it depends on rescheduling here in the LoadStoreOptimizer that isn't register pressure aware and regalloc that isn't aware of generating LDMs/STMs. This patch adds a (hidden) option to control the total number of instructions that can be re-ordered. I appreciate this looks only a tiny bit better than a hard-coded constant, but at least it allows more easy experimentation with different values for now. Ideally we calculate this reorder limit based on some heuristics, and take register pressure into account. I might be looking into that next. Differential Revision: https://reviews.llvm.org/D57954 llvm-svn: 353678
* [ARM] LoadStoreOptimizer: just a clean-up. NFC.Sjoerd Meijer2019-02-111-35/+25
| | | | | | Differential Revision: https://reviews.llvm.org/D57955 llvm-svn: 353670
* Implementation of asm-goto support in LLVMCraig Topper2019-02-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563
* [ARM] Add OptMinSize to ARMSubtargetSam Parker2019-02-0815-47/+60
| | | | | | | | | | | | In many places in the backend, we like to know whether we're optimising for code size and this is performed by checking the current machine function attributes. A subtarget is created on a per-function basis, so it's possible to know when we're compiling for code size on construction so record this in the new object. Differential Revision: https://reviews.llvm.org/D57812 llvm-svn: 353501
* [LSR] Generate cross iteration indexesSam Parker2019-02-071-0/+6
| | | | | | | | | | | | | | Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate a pre-indexed access. This allows the pointer to be updated via the single pre-indexed access so that (hopefully) no add/subs are required to update it for the next iteration. For small cores, this can significantly improve performance DSP-like loops. Differential Revision: https://reviews.llvm.org/D55373 llvm-svn: 353403
* [ARM GlobalISel] Support G_ICMP for Thumb2Diana Picus2019-02-072-12/+25
| | | | | | | Mark as legal and use the t2* equivalents of the arm mode instructions, e.g. t2CMPrr instead of plain CMPrr. llvm-svn: 353392
* [ARM] Reformat isRedundantFlagInstr for D57833. NFCDavid Green2019-02-071-8/+4
| | | | llvm-svn: 353386
* [ARM GlobalISel] Support G_GEP for Thumb2Diana Picus2019-02-052-3/+3
| | | | | | Same as ARM, but use a different opcode in the instruction selection. llvm-svn: 353151
* [AsmPrinter] Remove hidden flag -print-schedule.Andrea Di Biagio2019-02-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043
* [ARM] Mark 255 and 65535 as cheap for Thumb1 "And"David Green2019-02-041-3/+7
| | | | | | | | | | This prevents Constant Hoisting from pulling the constant out of the block, allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap by the default getIntImmCost, but I've added it for clarity. Differential Revision: https://reviews.llvm.org/D57671 llvm-svn: 353040
* [opaque pointer types] Pass value type to GetElementPtr creation.James Y Knight2019-02-011-3/+5
| | | | | | | | | This cleans up all GetElementPtr creation in LLVM to explicitly pass a value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57173 llvm-svn: 352913
* [opaque pointer types] Pass value type to LoadInst creation.James Y Knight2019-02-011-3/+3
| | | | | | | | | This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57172 llvm-svn: 352911
* [ARM] Thumb2: ConstantMaterializationCostSjoerd Meijer2019-01-311-2/+4
| | | | | | | | | | | | | | | | Constants can also be materialised using the negated value and a MVN, and this case seem to have been missed for Thumb2. To check the constant materialisation costs, we now call getT2SOImmVal twice, once for the original constant and then also for its negated value, and this function checks if the constant can both be splatted or rotated. This was revealed by a test that optimises for minsize: instead of a LDR literal pool load and having a literal pool entry, just a MVN with an immediate is smaller (and also faster). Differential Revision: https://reviews.llvm.org/D57327 llvm-svn: 352737
* [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTSSjoerd Meijer2019-01-311-0/+6
| | | | | | | | | | | | | | | | And instead just generate a libcall. My motivating example on ARM was a simple: shl i64 %A, %B for which the code bloat is quite significant. For other targets that also accept __int128/i128 such as AArch64 and X86, it is also beneficial for these cases to generate a libcall when optimising for minsize. On these 64-bit targets, the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS lowering operation action is not set to custom/expand. Differential Revision: https://reviews.llvm.org/D57386 llvm-svn: 352736
* GlobalISel: Allow bitcount ops to have different result typeMatt Arsenault2019-01-311-4/+8
| | | | | | For AMDGPU the result is always 32-bit for 64-bit inputs. llvm-svn: 352717
* GlobalISel: Fix creating MMOs with align 0Matt Arsenault2019-01-311-3/+3
| | | | llvm-svn: 352712
* [ARM] Use sub for negative offset load/store in thumb1David Green2019-01-292-6/+47
| | | | | | | | | | | This attempts to optimise negative values used in load/store operands a little. We currently try to selct them as rr, materialising the negative constant using a MOV/MVN pair. This instead selects ri with an immediate of 0, forcing the add node to become a simpler sub. Differential Revision: https://reviews.llvm.org/D57121 llvm-svn: 352475
* [ARM] Deduplicate table generated CC analysis codeReid Kleckner2019-01-286-275/+324
| | | | | | | Create ARMCallingConv.cpp and emit code for calling convention analysis from there. llvm-svn: 352431
* Remove no longer needed Arm specific LICENSE.TXT file.Arnaud A. de Grandmaison2019-01-281-47/+0
| | | | | | | | | As the codebase is now under the Apache 2.0 license with LLVM Exceptions, and all Arm's contributions, past or future, are under that new license, this Arm specific LICENSE.TXT is no longer needed, thus removing it. llvm-svn: 352376
* [ARM GlobalISel] Support integer division for Thumb2Diana Picus2019-01-281-19/+21
| | | | | | | | | Support G_SDIV, G_UDIV, G_SREM and G_UREM. The only significant difference between arm and thumb mode is that we need to check a different subtarget feature. llvm-svn: 352346
* [ARM GlobalISel] Support shifts for Thumb2Diana Picus2019-01-251-4/+4
| | | | | | | | | | Same as ARM. On this occasion we split some of the instruction select tests for more complicated instructions into their own files, so we can reuse them for ARM and Thumb mode. Likewise for the legalizer tests. llvm-svn: 352188
* [ARM GlobalISel] Remove rebase artifact from r351882. NFCDiana Picus2019-01-251-3/+0
| | | | | | | r351882 introduced some superfluous calls to mark G_INTTOPTR and G_PTRTOINT as legal (looks like a rebase mishap). Remove them. llvm-svn: 352187
* Revert r351938 "[ARM] Alter the register allocation order for minsize on Thumb2"Reid Kleckner2019-01-231-27/+4
| | | | | | | This change caused fatal backend errors when compiling a file in libvpx for Android. llvm-svn: 351979
* [ARM] Alter the register allocation order for minsize on Thumb2David Green2019-01-231-4/+27
| | | | | | | | | | | | | | | Currently in Arm code, we allocate LR first, under the assumption that it needs to be saved anyway. Unfortunately this has the disadvantage that it will require any instructions using it to be the longer thumb2 instructions, not the shorter thumb1 ones. This switches the order when we are optimising for minsize, returning to the default order so that more lower registers can be used. It can end up requiring more pushed registers, but on average produces smaller code. Differential Revision: https://reviews.llvm.org/D56008 llvm-svn: 351938
* [ARM][CGP] Check trunc type before replacingSam Parker2019-01-231-7/+13
| | | | | | | | | | | | In the last stage of type promotion, we replace any zext that uses a new trunc with the operand of the trunc. This is okay when we only allowed one type to be optimised, but now its the case that the trunc maybe needed to produce a more narrow type than the one we were optimising for. So we need to check this before doing the replacement. Differential Revision: https://reviews.llvm.org/D57041 llvm-svn: 351935
* GlobalISel: Allow shift amount to be a different typeMatt Arsenault2019-01-221-1/+6
| | | | | | | | | For AMDGPU the shift amount is never 64-bit, and this needs to use a 32-bit shift. X86 uses i8, but seemed to be hacking around this before. llvm-svn: 351882
* Reapply "IR: Add fp operations to atomicrmw"Matt Arsenault2019-01-221-0/+3
| | | | | | | This reapplies commits r351778 and r351782 with RISCV test fixes. llvm-svn: 351850
* Revert r351778: IR: Add fp operations to atomicrmwChandler Carruth2019-01-221-3/+0
| | | | | | | | | | | | | This broke the RISCV build, and even with that fixed, one of the RISCV tests behaves surprisingly differently with asserts than without, leaving there no clear test pattern to use. Generally it seems bad for hte IR to differ substantially due to asserts (as in, an alloca is used with asserts that isn't needed without!) and nothing I did simply would fix it so I'm reverting back to green. This also required reverting the RISCV build fix in r351782. llvm-svn: 351796
* IR: Add fp operations to atomicrmwMatt Arsenault2019-01-221-0/+3
| | | | | | Add just fadd/fsub for now. llvm-svn: 351778
* [ARM] Combine ands+lsls to lsls+lsrs for Thumb1.Eli Friedman2019-01-221-4/+60
| | | | | | | | | | | | | | | | | | | This patch may seem familiar... but my previous patch handled the equivalent lsls+and, not this case. Usually instcombine puts the "and" after the shift, so this case doesn't come up. However, if the shift comes out of a GEP, it won't get canonicalized by instcombine, and DAGCombine doesn't have an equivalent transform. This also modifies isDesirableToCommuteWithShift to suppress DAGCombine transforms which would make the overall code worse. I'm not really happy adding a bunch of code to handle this, but it would probably be tricky to substantially improve the behavior of DAGCombine here. Differential Revision: https://reviews.llvm.org/D56032 llvm-svn: 351776
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-19124-496/+372
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Fix capitalization. NFCDiana Picus2019-01-171-6/+6
| | | | llvm-svn: 351425
OpenPOWER on IntegriCloud