summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64 MachineCombine] Enhance/Add support for general reassociation to ↵Haicheng Wu2016-01-072-11/+50
| | | | | | | | reduce the critical path Allow fadd/fmul to be reassociated in aarch64. llvm-svn: 257024
* Extract helper function to merge MemoryOperand lists [NFC]Philip Reames2016-01-061-22/+4
| | | | | | | | | | In the discussion on http://reviews.llvm.org/D15730, Andy pointed out we had a utility function for merging MMO lists. Since it turned we actually had two copies and there's another review in progress (http://reviews.llvm.org/D15230) which needs the same, extract it into a utility function and clean up the interfaces to make it easier to use with a MachineInstBuilder. I introduced a pair here to track size and allocation together. I think we should probably move in the direction of the MachineOperandsRef helper class, but I'm leaving that for further work. I want to get the poison state introduced before I make major changes to the interface. Differential Revision: http://reviews.llvm.org/D15757 llvm-svn: 256909
* Delete trailing whitespace; NFCJunmo Park2016-01-062-8/+8
| | | | llvm-svn: 256908
* Delete trailing whitespace; NFCJunmo Park2016-01-061-4/+4
| | | | llvm-svn: 256906
* [AArch64] Add support for Samsung Exynos-M1MinSeong Kim2016-01-052-1/+19
| | | | | | | | Adds core tuning support for new Samsung Exynos-M1 core (ARMv8-A). Differential Revision: http://reviews.llvm.org/D15663 llvm-svn: 256828
* Remove extra whitespace. NFC.Junmo Park2016-01-051-2/+2
| | | | llvm-svn: 256820
* [AArch64] Optimize some simple TBZ/TBNZ cases.Geoff Berry2016-01-041-0/+100
| | | | | | | | | | | | | | | | | | Summary: Add some AArch64 dag combines to optimize some simple TBZ/TBNZ cases: (tbz (and x, m), b) -> (tbz x, b) (tbz (shl x, c), b) -> (tbz x, b-c) (tbz (shr x, c), b) -> (tbz x, b+c) (tbz (xor x, -1), b) -> (tbnz x, b) Reviewers: jmolloy, mcrosier, t.p.northover Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D15702 llvm-svn: 256765
* Remove extra forward declarations and scrub includes for all in tree ↵Craig Topper2015-12-252-4/+1
| | | | | | InstPrinters. NFC llvm-svn: 256427
* [AArch64] Promote loads from storedJun Bum Lim2015-12-221-3/+280
| | | | | | | | | | | | | | | | | | | | This is a recommit of r256004 which was reverted in r256160. The issue was the incorrect promotion for half and byte loads transformed into mov instructions. This fix will replace half and byte type loads only with bit field extracts. Original commit message: This change promotes load instructions which directly read from stored by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256249
* [AArch64] Add additional extract-extend patterns for smovMatthew Simpson2015-12-211-0/+7
| | | | | | | | | | | | | This patch adds to the target description two additional patterns for matching extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and v8i16-to-i64 cases. The existing patterns miss these cases because the extracted elements must first be legalized to i32, resulting in any_extend nodes. This was originally implemented as a DAG combine (r255895), but was reverted due to failing out-of-tree tests. llvm-svn: 256176
* Revert "[AArch64] Promote loads from stores"Jun Bum Lim2015-12-211-280/+3
| | | | | | This reverts commit r256004 due to a failure in cortex-a53. llvm-svn: 256160
* [AArch64] Enable PostRAScheduler for AArch64 generic build.Chad Rosier2015-12-211-1/+2
| | | | | | | | | Disable post-ra scheduler for perturbed tests to appease the bots and to preserve the history of the tests. http://reviews.llvm.org/D15652 llvm-svn: 256158
* [AArch64] Promote loads from storesJun Bum Lim2015-12-181-3/+280
| | | | | | | | | | | | | | This change promotes load instructions which directly read from stores by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256004
* Revert "[AArch64] Add DAG combine for extract extend pattern"Matthew Simpson2015-12-171-19/+1
| | | | | | | This reverts commit r255895. The patch breaks internal tests. Reverting until a fix is ready. llvm-svn: 255928
* Revert "[AArch64] Enable PostRAScheduler for AArch64 generic build"Rafael Espindola2015-12-171-2/+1
| | | | | | This reverts commit r255896. It broke the tests. llvm-svn: 255899
* [AArch64] Enable PostRAScheduler for AArch64 generic buildMinSeong Kim2015-12-171-1/+2
| | | | | | | | | | | This patch enables PostRAScheduler specifically for AArch64 generic build, which is beneficial from the performance perspective. Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed. Also benchmarks from LLVM test-suite did not regress. Differential Revision: http://reviews.llvm.org/D15557 llvm-svn: 255896
* [AArch64] Add DAG combine for extract extend patternMatthew Simpson2015-12-171-1/+19
| | | | | | | | | | This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) -> (extract_vector_elt v, i). The combine enables us to better match some SMOV patterns. Differential Revision: http://reviews.llvm.org/D15515 llvm-svn: 255895
* AArch64: Simplify emitEpilogue() and related code; NFCMatthias Braun2015-12-171-24/+25
| | | | | | This is in preparation to an upcoming patch. llvm-svn: 255872
* [AArch64] Simplify some TRI/TII getters. NFC.Ahmed Bougacha2015-12-161-7/+6
| | | | | | We don't need static_casts when we use the right Subtarget. llvm-svn: 255836
* CXX_FAST_TLS calling convention: performance improvement for AArch64.Manman Ren2015-12-167-3/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. The target independent portion was committed as r255353. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15341 llvm-svn: 255821
* Remove dead function AArch64TargetLowering::getFunctionAlignment. NFC.Geoff Berry2015-12-142-8/+0
| | | | | | | | | | Reviewers: t.p.northover, jmolloy, mcrosier Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D15458 llvm-svn: 255509
* Normalize MBB's successors' probabilities in several locations.Cong Hou2015-12-131-2/+2
| | | | | | | | | | | | This patch adds some missing calls to MBB::normalizeSuccProbs() in several locations where it should be called. Those places are found by checking if the sum of successors' probabilities is approximate one in MachineBlockPlacement pass with some instrumented code (not in this patch). Differential revision: http://reviews.llvm.org/D15259 llvm-svn: 255455
* Revert r248483, r242546, r242545, and r242409 - absdiff intrinsicsHal Finkel2015-12-112-37/+26
| | | | | | | | | | | | | | | | | | | | After much discussion, ending here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html it has been decided that, instead of having the vectorizer directly generate special absdiff and horizontal-add intrinsics, we'll recognize the relevant reduction patterns during CodeGen. Accordingly, these intrinsics are not needed (the operations they represent can be pattern matched, as is already done in some backends). Thus, we're backing these out in favor of the current development work. r248483 - Codegen: Fix llvm.*absdiff semantic. r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation llvm-svn: 255387
* CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()Matthias Braun2015-12-111-2/+2
| | | | | | | | | | | | | | | | | | | | computeRegisterLiveness() was broken in that it reported dead for a register even if a subregister was alive. I assume this was because the results of analayzePhysRegs() are hard to understand with respect to subregisters. This commit: Changes the results of analyzePhysRegs (=struct PhysRegInfo) to be clearly understandable, also renames the fields to avoid silent breakage of third-party code (and improve the grammar). Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling it and removing workarounds for the bug. This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033 Differential Revision: http://reviews.llvm.org/D15320 llvm-svn: 255362
* Start replacing vector_extract/vector_insert with extractelt/inserteltMatt Arsenault2015-12-111-5/+5
| | | | | | | | | | | | | | | | | | | | These are redundant pairs of nodes defined for INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT. insertelement/extractelement are slightly closer to the corresponding C++ node name, and has stricter type checking so prefer it. Update targets to only use these nodes where it is trivial to do so. AArch64, ARM, and Mips all have various type errors on simple replacement, so they will need work to fix. Example from AArch64: def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8), (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>; Which is trying to do sext_inreg i8, i8. llvm-svn: 255359
* Fix fptosi, fptoui from f16 vectors to i8, i16 vectorsPirama Arumuga Nainar2015-12-101-0/+10
| | | | | | | | | | | | | | | | Summary: Convert f16 vectors to corresponding f32 vectors before doing the conversion to int. Add tests for v4f16, v8f16. Reviewers: ab, jmolloy Subscribers: llvm-commits, srhines Differential Revision: http://reviews.llvm.org/D14936 llvm-svn: 255263
* [AArch64] Fix FP16 vector instructions that should only accept low registersOliver Stannard2015-12-091-3/+3
| | | | llvm-svn: 255113
* [AArch64][ARM] Don't base interleaved op legality on type alloc size.Ahmed Bougacha2015-12-092-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise, we think that most types that look like they'd fit in a legal vector type are legal (so, basically, *any* vector type with a size between 33 and 128 bits, I think, since we use pow2 alignment; e.g., v2i25, v3f32, ...). DataLayout::getTypeAllocSize rounds up based on alignment. When checking for target intrinsic legality, that's not what we want: if rounding makes a difference, the type isn't legal, and the target intrinsics shouldn't be used, as they are always assumed legal. One could make the argument that alloc size is ultimately the most relevant here, since we're dealing with LD/ST intrinsics. That's only true if we did legalize them though; that's a problem for another day. Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits. Type::getSizeInBits can't be used because that'd gratuitously break pointer vector support. Some of these uses are currently fine, because we only hit them when the type is already known legal (e.g., r114454). Update them for consistency. It's faster to avoid the rounding anyway! llvm-svn: 255089
* Define selection for v4f16, v8f16 scalar_to_vectorPirama Arumuga Nainar2015-12-081-0/+5
| | | | | | | | | | | | | | | | | | Summary: This fixes failure when trying to select insertelement <4 x half> undef, half %a, i64 0 which gets transformed to a scalar_to_vector node. The accompanying v4 and v8 tests fail instruction selection without this patch. Reviewers: ab, jmolloy Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D15322 llvm-svn: 255072
* [AArch64] Add ARMv8.2-A FP16 vector instructionsOliver Stannard2015-12-084-231/+501
| | | | | | | | | | | | | | | | | ARMv8.2-A adds 16-bit floating point versions of all existing SIMD floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. Note that VFP without SIMD is not a valid combination for any version of ARMv8-A, but I have ensured that these instructions all depend on both FeatureNEON and FeatureFullFP16 for consistency. The ".2h" vector type specifier is now legal (for the scalar pairwise reduction instructions), so some unrelated tests have been modified as different error messages are emitted. This is not a problem as the invalid operands are still caught. llvm-svn: 255010
* [CXX TLS calling convention] Add support for AArch64.Manman Ren2015-12-082-0/+13
| | | | | | rdar://9001553 llvm-svn: 254978
* Replace uint16_t with the MCPhysReg typedef in many places. A lot of ↵Craig Topper2015-12-051-16/+16
| | | | | | physical register arrays already use this typedef. llvm-svn: 254843
* [EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC)Philip Reames2015-12-051-2/+2
| | | | | | | | | | | | When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access. Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing. Note that the actual implementation was always bailing if the load or store wasn't simple. Reminder: - "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered - "ordered" - imposes ordering constraints on other nearby memory operations - "atomic" - can't be split or sheared. In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used. - "simple" - a load which is none of the above. These are normal loads and what most of the optimizer works with. llvm-svn: 254805
* [AArch64] Expand vector SDIVREM/UDIVREM operations.Chad Rosier2015-12-041-0/+4
| | | | | | | http://reviews.llvm.org/D15214 Patch by Ana Pazos <apazos@codeaurora.org>! llvm-svn: 254773
* AArch64FastISel: Use cbz/cbnz to branch on i1Matthias Braun2015-12-031-61/+25
| | | | | | | | | In the case of a conditional branch without a preceding cmp we used to emit a "and; cmp; b.eq/b.ne" sequence, use tbz/tbnz instead. Differential Revision: http://reviews.llvm.org/D15122 llvm-svn: 254621
* [AArch64]: Add support for Cortex-A35Christof Douma2015-12-022-1/+11
| | | | | | Adds support for the new Cortex-A35 ARMv8-A core. llvm-svn: 254503
* AArch64: fix 128-bit shiftsTim Northover2015-12-021-33/+54
| | | | | | | | | | | | | | We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an UNDEF value (and worse, it's not what you want with the natural Arch64 implementation). The generated code is pretty horrific, but I couldn't come up with an obviously better alternative (if the amount is constant EXTR could help). Turns out 128-bit shifts are just nasty. rdar://22491037 llvm-svn: 254475
* [AArch64] Fix a corner case in BitFeild selectWeiming Zhao2015-12-011-5/+11
| | | | | | | | | | | | | | | | | Summary: When not useful bits, BitWidth becomes 0 and APInt will not be happy. See https://llvm.org/bugs/show_bug.cgi?id=25571 We can just mark the operand as IMPLICIT_DEF is none bits of it is used. Reviewers: t.p.northover, jmolloy Subscribers: gberry, jmolloy, mgrang, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14803 llvm-svn: 254440
* [AArch64] Add ARMv8.2-A Statistical Profiling ExtensionOliver Stannard2015-12-019-1/+170
| | | | | | | | | | | | The Statistical Profiling Extension is an optional extension to ARMv8.2-A. Since it is an optional extension, I have added the FeatureSPE subtarget feature to control it. The assembler-visible parts of this extension are the new "psb csync" instruction, which is equivalent to "hint #17", and a number of system registers. Differential Revision: http://reviews.llvm.org/D15021 llvm-svn: 254401
* [AArch64] Add ARMv8.2-A FP16 scalar instructionsOliver Stannard2015-11-273-43/+222
| | | | | | | | | | | | | | | ARMv8.2-A adds 16-bit floating point versions of all existing VFP floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. Most of these instructions are the same as the 32- and 64-bit versions, but with the type field (bits 23-22) set to 0b11. Previously the top bit of the size field was always 0, so the instruction classes only provided a 1-bit size field, which I have widened to 2 bits. Differential Revision: http://reviews.llvm.org/D15014 llvm-svn: 254198
* [AArch64] Add ARMv8.2-A new AT instruction variantsOliver Stannard2015-11-263-1/+32
| | | | | | | | | | | ARMv8.2-A adds new variants of the "at" (address translate) system instruction, which take the PSTATE.PAN bit (added in ARMv8.1-A). These are a required part of ARMv8.2-A, so no additional subtarget features are required. Differential Revision: http://reviews.llvm.org/D15018 llvm-svn: 254159
* [AArch64] Add ARMv8.2-A UAO PSTATE bitOliver Stannard2015-11-265-3/+17
| | | | | | | | | | | | | ARMv8.2-A adds a new PSTATE bit, PSTATE.UAO, which allows the LDTR/STTR instructions to behave the same as LDR/STR with respect to execute-only pages at higher privilege levels. New variants of the MSR/MRS instructions are added to allow reading and writing this bit. It is a required part of ARMv8.2-A, so no additional subtarget features are required. Differential Revision: http://reviews.llvm.org/D15020 llvm-svn: 254157
* [AArch64] Add ARMv8.2-A persistent memory instructionOliver Stannard2015-11-263-3/+18
| | | | | | | | | | | ARMv8.2-A adds the "dc cvap" instruction, which is a system instruction that cleans caches to the point of persistence (for systems that have persistent memory). It is a required part of ARMv8.2-A, so no additional subtarget features are required. Differential Revision: http://reviews.llvm.org/D15016 llvm-svn: 254156
* [AArch64] Add ARMv8.2-A ID_A64MMFR2_EL1 registerOliver Stannard2015-11-262-0/+2
| | | | | | | | | | ARMv8.2-A adds a new ID register, ID_A64MMFR2_EL1, which behaves in the same way as ID_A64MMFR0_EL1 and ID_A64MMFR1_EL1. It is a required part of ARMv8.2-A, so no additional subtarget features are required. Differential Revision: http://reviews.llvm.org/D15017 llvm-svn: 254155
* [AArch64] Add subtarget features for ARMv8.2-AOliver Stannard2015-11-264-5/+20
| | | | | | | | | | | | | | This adds subtarget features for ARMv8.2-A, which builds on (and requires the features from) ARMv8.1-A. Most assembler-visible features of ARMv8.2-A are system instructions, and are all required parts of the architecture, so just depend on the HasV8_2aOps subtarget feature. There is also one large, optional feature, which adds 16-bit floating point versions of all existing floating-point instructions (VFP and SIMD), this is represented by the FeatureFullFP16 subtarget feature. Differential Revision: http://reviews.llvm.org/D15013 llvm-svn: 254154
* Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)Artyom Skrobov2015-11-251-25/+13
| | | | | | | | | | | | | | Summary: Many target lowerings copy-paste the code to test SDValues for known constants. This code can instead be shared in SelectionDAG.cpp, and reused in the targets. Reviewers: MatzeB, andreadb, tstellarAMD Subscribers: arsenm, jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D14945 llvm-svn: 254085
* Let SelectionDAG start to use probability-based interface to add successors.Cong Hou2015-11-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes. 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights. 3. Use new interfaces in all other passes. 4. Remove old interfaces. This the second patch above. In this patch SelectionDAG starts to use probability-based interfaces in MBB to add successors but other MC passes are still using weight-based interfaces. Therefore, we need to maintain correct weight list in MBB even when probability-based interfaces are used. This is done by updating weight list in probability-based interfaces by treating the numerator of probabilities as weights. This change affects many test cases that check successor weight values. I will update those test cases once this patch looks good to you. Differential revision: http://reviews.llvm.org/D14361 llvm-svn: 253965
* [AArch64]Merge narrow zero stores to a wider storeJun Bum Lim2015-11-201-16/+80
| | | | | | | | | | | | | This change merges adjacent zero stores into a wider single store. For example : strh wzr, [x0] strh wzr, [x0, #2] becomes str wzr, [x0] This will fix PR25410. llvm-svn: 253711
* [AArch64] Refactoring aarch64-ldst-opt. NCF.Jun Bum Lim2015-11-191-16/+13
| | | | | | | | | Summary : * Rename isSmallTypeLdMerge() to isNarrowLoad(). * Rename NumSmallTypeMerged to NumNarrowTypePromoted. * Use Subtarget defined as a member variable. llvm-svn: 253587
* [AArch64]Extend merging narrow loads into a wider loadJun Bum Lim2015-11-191-26/+107
| | | | | | | | | | | | | | This change extends r251438 to handle more narrow load promotions including byte type, unscaled, and signed. For example, this change will convert : ldursh w1, [x0, #-2] ldurh w2, [x0, #-4] into ldur w2, [x0, #-4] asr w1, w2, #16 and w2, w2, #0xffff llvm-svn: 253577
OpenPOWER on IntegriCloud