summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not ↵Arnold Schwaighofer2015-05-081-1/+3
| | | | | | | | | | | | | | | | | | | | non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916
* Change getTargetNodeName() to produce compiler warnings for missing cases, ↵Matthias Braun2015-05-072-4/+5
| | | | | | fix them llvm-svn: 236775
* [AArch64] Fix sext/zext folding in address arithmetic.Pete Cooper2015-05-071-29/+32
| | | | | | | | We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there. Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use. llvm-svn: 236764
* [X86] Disable loop unrolling in loop vectorization pass when VF is 1.Wei Mi2015-05-062-2/+2
| | | | | | | | | | | | | The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture, by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce the cost of overflow check, memory boundary check and extra prologue/epilogue code when regular unroller will unroll the loop another time. Disable it when VF==1 remove the unnecessary cost on x86. The same can be done for other platforms after verifying interleaving/memory bound checking to be not perf critical on those platforms. Differential Revision: http://reviews.llvm.org/D9515 llvm-svn: 236613
* [ShrinkWrap] Add (a simplified version) of shrink-wrapping.Quentin Colombet2015-05-052-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. ** Context ** Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. ** Motivating example ** Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. ** Proposed Solution ** This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. ** Design Decisions ** 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507
* [AArch64][FastISel] Variant of the logical instructions that use two inputQuentin Colombet2015-05-011-1/+1
| | | | | | | | registers cannot write on SP. rdar://problem/20748715 llvm-svn: 236352
* [AArch64][FastISel] Fix the setting of kill flags for MUL -> UMULH sequences.Quentin Colombet2015-05-011-2/+8
| | | | | | rdar://problem/20748715 llvm-svn: 236346
* [AArch64] Fix bad register class constraint in fast-isel for TST instruction.Quentin Colombet2015-04-301-1/+4
| | | | | | rdar://problem/20748715 llvm-svn: 236273
* AArch64: add BFC alias for the BFI/BFM instructions.Tim Northover2015-04-302-4/+69
| | | | | | | | | Unlike 32-bit ARM, AArch64 can use wzr/xzr to implement this without the need for a separate instruction. rdar://18679590 llvm-svn: 236245
* [AArch64] Refactor out codes that depend on specific CS save sequence.Manman Ren2015-04-291-19/+28
| | | | | | No functionality change. llvm-svn: 236143
* IR: Give 'DI' prefix to debug info metadataDuncan P. N. Exon Smith2015-04-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Finish off PR23080 by renaming the debug info IR constructs from `MD*` to `DI*`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the *previous* commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to *this* commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120
* Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"Sergey Dmitrouk2015-04-284-309/+357
| | | | | | | | | | | | | | | | | | | | | | | | | [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989
* Revert "[DebugInfo] Add debug locations to constant SD nodes"Daniel Jasper2015-04-284-357/+309
| | | | | | | This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987
* [DebugInfo] Add debug locations to constant SD nodesSergey Dmitrouk2015-04-284-309/+357
| | | | | | | | | | | | | | | | | | | | | | | This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
* [MC] Use LShr for constant evaluation of ">>" on ELF/arm64--darwin.Ahmed Bougacha2015-04-281-0/+4
| | | | | | | | | | This matches other assemblers and is less unexpected (e.g. PR23227). On ELF, I tried binutils gas v2.24 and nasm 2.10.09, and they both agree on LShr. On COFF, I couldn't get my hands on an assembler yet, so don't change the behavior. For now, don't change it on non-AArch64 Darwin either, as the other assembler is gas v1.38, which does an AShr. llvm-svn: 235963
* [AArch64] Also combine vector selects fed by non-i1 SETCCs.Ahmed Bougacha2015-04-271-3/+15
| | | | | | | | | | | | | | After legalization, scalar SETCC has an i32 result type on AArch64. The i1 requirement seems too conservative, replace it with an assert. This also means that we now can run after legalization. That should also be fine, since the ops legalizer runs again after each combine, and all types created all have the same sizes as the (legal) inputs. Exposed by r235917; while there, robustize its tests (bsl also uses the register it defines). llvm-svn: 235922
* [AArch64] Don't assert when combining (v3f32 select (setcc f64)).Ahmed Bougacha2015-04-271-0/+6
| | | | | | | | When the setcc has f64 operands, we can't build a vector setcc mask to feed a vselect, because f64 doesn't divide v3f32 evenly. Just bail out when that happens. llvm-svn: 235917
* [AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr.Lang Hames2015-04-241-16/+16
| | | | | | | AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a reference for this is crufty. llvm-svn: 235752
* [AArch64] Add nvcast patterns for v4f16 and v8f16Pirama Arumuga Nainar2015-04-231-0/+8
| | | | | | | | | | | | | | | | | | Summary: Constant stores of f16 vectors can create NvCast nodes from various operand types to v4f16 or v8f16 depending on patterns in the stored constants. This patch adds nvcast rules with v4f16 and v8f16 values. AArchISelLowering::LowerBUILD_VECTOR has the details on which constant patterns generate the nvcast nodes. Reviewers: jmolloy, srhines, ab Subscribers: rengolin, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D9201 llvm-svn: 235610
* [AArch64] Handle vec4, vec8, vec16 *itofp for halfPirama Arumuga Nainar2015-04-231-0/+10
| | | | | | | | | | | | | | | | | | | | | | Summary: Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32, v8i8, v8i16 inputs to allow promotion of v4f16 results. Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32, and i64 vectors. Only missing tests are for v16i8 and v16i16 as the shift operations are too complicated to write a proper check sequence. The conversions from v4i64 to v4f16 do not depend on this patch - v4i64 is split and the conversion gets handled while lowering v2i64. I am adding a test here for completeness. Reviewers: aemerson, rengolin, ab, jmolloy, srhines Subscribers: rengolin, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D9166 llvm-svn: 235609
* [AArch64] Use MachineRegisterInfo instead of LiveIntervals to calculate ↵Pete Cooper2015-04-221-4/+4
| | | | | | | | | | | | liveness. NFC. The CondOpt pass currently uses LiveIntervals to set the dead flag on a def. This patch uses MachineRegisterInfo::use_empty instead as that is equivalent to the def being dead. This removes an instance of LiveIntervals in the pass manager pipeline and saves 3.8% of compile time on llc conpiled for AArch64. Reviewed by Chad Rosier and Zhaoshi. llvm-svn: 235532
* [AArch64] Disable complex GEP optimization by default.James Molloy2015-04-221-1/+1
| | | | | | | | Enough concerns were raised that this optimization is pessimising some code patterns. The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher. llvm-svn: 235491
* [AArch64] LORID_EL1 register must be treated as read-onlyVladimir Sukharev2015-04-201-3/+5
| | | | | | | | | | | | Patch by: John Brawn Reviewers: jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9105 llvm-svn: 235314
* [AArch64] Don't force MVT::Untyped when selecting LD1LANEpost.Ahmed Bougacha2015-04-171-1/+1
| | | | | | | | | | | | | | | | | | | The result is either an Untyped reg sequence, on ldN with N > 1, or just the type of the input vector, on ld1. Don't force Untyped. Instead, just use the type of the reg sequence. This mirrors the behavior of createTuple, which feeds the LD1*_POST. The narrow code path wasn't actually covered by tests, because V64 insert_vector_elt are widened to V128 before the LD1LANEpost combine has the chance to run, usually. The only case where it does run on V64 vectors is if the vector ops legalizer ran. So, tickle the code with a ctpop. Fixes PR23265. llvm-svn: 235243
* [AArch64] Avoid vector->load dependency cycles when creating LD1*post.Ahmed Bougacha2015-04-171-0/+7
| | | | | | | | They would break the SelectionDAG. Note that the opposite load->vector dependency is already obvious in: (LD1*post vec, ..) llvm-svn: 235224
* [mc] Clean up emission of byte sequencesBenjamin Kramer2015-04-171-4/+1
| | | | | | No functional change intended. llvm-svn: 235178
* [AArch64] Don't assert on f16 in DUP PerfectShuffle generator.Ahmed Bougacha2015-04-161-1/+1
| | | | | | | | Found by code inspection, but breaking i16 at least breaks other tests. They aren't checking this in particular though, so also add some explicit tests for the already working types. llvm-svn: 235148
* Disable AArch64 fast-isel on big-endian call vector returns.Pete Cooper2015-04-161-0/+5
| | | | | | | | A big-endian vector return needs a byte-swap which we aren't doing right now. For now just bail on these cases to get correctness back. llvm-svn: 235133
* [AArch64] Add v8.1a "Virtualization Host Extensions"Vladimir Sukharev2015-04-162-1/+59
| | | | | | | | | | | | Reviewers: t.p.northover Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8500 Patch by: Tom Coxon llvm-svn: 235107
* [AArch64] Add v8.1a "Limited Ordering Regions" extensionVladimir Sukharev2015-04-165-0/+40
| | | | | | | | | | | | Reviewers: t.p.northover Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8499 Patch by: Tom Coxon llvm-svn: 235105
* [AArch64] Add v8.1a "Privileged Access Never" extensionVladimir Sukharev2015-04-162-3/+18
| | | | | | | | | | Reviewers: jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8498 llvm-svn: 235104
* [AArch64] Handle Cyclone-specific register in common wayVladimir Sukharev2015-04-162-29/+5
| | | | | | | | | | | | Reviewers: jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8584 Patch by: Tom Coxon llvm-svn: 235102
* [AArch64] Follow-up to: Refactor AArch64NamedImmMapper to become dependent ↵Vladimir Sukharev2015-04-161-665/+665
| | | | | | | | | | on subtarget features Fixed compilation with clang on some buildbots with "-Werror -Wmissing-field-initializers" Related to: http://reviews.llvm.org/rL235089 llvm-svn: 235099
* [AArch64] Refactor AArch64NamedImmMapper to become dependent on subtarget ↵Vladimir Sukharev2015-04-165-20/+46
| | | | | | | | | | | | | | | | | | | | | | features. In order to introduce v8.1a-specific entities, Mappers should be aware of SubtargetFeatures available. This patch introduces refactoring, that will then allow to easily introduce: - v8.1-specific "pan" PState for PStateMapper (PAN extension) - v8.1-specific sysregs for SysRegMapper (LOR,VHE extensions) Reviewers: jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8496 Patch by Tom Coxon llvm-svn: 235089
* [AArch64] Fix invalid use of references to BuildMI.James Molloy2015-04-161-3/+3
| | | | | | | | This was found in GCC PR65773 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65773). We shouldn't be taking a reference to the temporary that BuildMI returns, we must copy it. llvm-svn: 235088
* Change range-based for-loops to be -Wrange-loop-analysis clean.Richard Trieu2015-04-151-1/+1
| | | | | | No functionality change. llvm-svn: 234963
* Use raw_pwrite_stream in the object writer/streamer.Rafael Espindola2015-04-147-15/+18
| | | | | | The ELF object writer will take advantage of that in the next commit. llvm-svn: 234950
* [AArch64] Allow non-standard INS/DUP encodingsBradley Smith2015-04-142-10/+10
| | | | | | | | | | The ARMv8 ARMARM states that for these instructions in A64 state: "Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.", (imm4 for INS). Make the disassembler accept any encoding with these ignored bits set to 1. llvm-svn: 234896
* DebugInfo: Gut DIVariable and DIGlobalVariableDuncan P. N. Exon Smith2015-04-141-2/+2
| | | | | | | | | | Gut all the non-pointer API from the variable wrappers, except an implicit conversion from `DIGlobalVariable` to `DIDescriptor`. Note that if you're updating out-of-tree code, `DIVariable` wraps `MDLocalVariable` (`MDVariable` is a common base class shared with `MDGlobalVariable`). llvm-svn: 234840
* Allow memory intrinsics to be tail callsKrzysztof Parzyszek2015-04-131-3/+4
| | | | llvm-svn: 234764
* Use 'override/final' instead of 'virtual' for overridden methodsAlexander Kornienko2015-04-113-3/+3
| | | | | | | | | | | | | | The patch is generated using clang-tidy misc-use-override check. This command was used: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \ -checks='-*,misc-use-override' -header-filter='llvm|clang' \ -j=32 -fix -format http://reviews.llvm.org/D8925 llvm-svn: 234679
* [CodeGen] Split -enable-global-merge into ARM and AArch64 options.Ahmed Bougacha2015-04-111-1/+8
| | | | | | | | | | | | | Currently, there's a single flag, checked by the pass itself. It can't force-enable the pass (and is on by default), because it might not even have been created, as that's the targets decision. Instead, have separate explicit flags, so that the decision is consistently made in the target. Keep the flag as a last-resort "force-disable GlobalMerge" for now, for backwards compatibility. llvm-svn: 234666
* [AArch64] Strengthen the code for the prologue insertion.Quentin Colombet2015-04-101-0/+2
| | | | | | | | | The spilled registers are pristine and thus, correctly handled by the register scavenger and so on, but the liveness information is strictly speaking wrong at this point. Fix that. llvm-svn: 234664
* [AArch64] Changes some SchedAlias to WriteRes for Cortex-A57.Chad Rosier2015-04-101-3/+8
| | | | | | | | | | | | Using SchedAliases is convenient and works well for latency and resource lookup for instructions. However, this creates an entry in AArch64WriteLatencyTable with a WriteResourceID of 0, breaking any SchedReadAdvance since the lookup will fail. http://reviews.llvm.org/D8043 Patch by Dave Estes <cestes@codeaurora.org>! llvm-svn: 234594
* [AArch64] Adjusts Cortex-A57 machine model to handle zero shift.Chad Rosier2015-04-101-0/+9
| | | | | | | http://reviews.llvm.org/D8043 Patch by Dave Estes <cestes@codeaurora.org>! llvm-svn: 234593
* Reduce dyn_cast<> to isa<> or cast<> where possible.Benjamin Kramer2015-04-101-1/+1
| | | | | | No functional change intended. llvm-svn: 234586
* [AArch64] Promote f16 operations to f32.Ahmed Bougacha2015-04-101-8/+50
| | | | | | | | | | | | | | | | | | | | For the most common ones (such as fadd), we already did the promotion. Do the same thing for all the others. Currently, we'll just crash/assert on all these operations, as there's no hardware or libcall support whatsoever. f16 (half) is specified as an interchange - not arithmetic - format, and is expected to be promoted to single-precision for arithmetic operations. While there, teach the legalizer about promoting some of the (mostly floating-point) operations that we never needed before. Differential Revision: http://reviews.llvm.org/D8648 See related discussion on the thread for: http://reviews.llvm.org/D8755 llvm-svn: 234550
* [AArch64][FastISel] Fix integer extend optimization.Juergen Ributzka2015-04-091-5/+6
| | | | | | | | | | | | | | The integer extend optimization tries to fold the extend into the load instruction. This requires us to identify if the extend has already been emitted or not and act accordingly on it. The check that was originally performed for this was not sufficient. Besides checking the ValueMap for a mapped register we also need to check if the virtual register has already an associated machine instruction that defines it. This fixes rdar://problem/20470788. llvm-svn: 234529
* clang-format bits of code to make a followup patch easy to read.Rafael Espindola2015-04-094-6/+6
| | | | llvm-svn: 234519
* [AArch64] Add support for dynamic stack alignmentKristof Beyls2015-04-094-43/+172
| | | | | | Differential Revision: http://reviews.llvm.org/D8876 llvm-svn: 234471
OpenPOWER on IntegriCloud