summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* livePhysRegs: Pass MBB by reference in addLive{Ins|Outs}(); NFCMatthias Braun2016-05-031-2/+2
| | | | | | | The block must no be nullptr for the addLiveIns()/addLiveOuts() function. llvm-svn: 268340
* LivePhysRegs: Automatically determine presence of pristine regs.Matthias Braun2016-05-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | Remove the AddPristinesAndCSRs parameters from addLiveIns()/addLiveOuts(). We need to respect pristine registers after prologue epilogue insertion, Seeing that we got this wrong in at least two commits already, we should rather pay the small price to query MachineFrameInfo for it. There are three cases that did not set AddPristineAndCSRs to true even after register allocation: - ExecutionDepsFix: live-out registers are used as a hint that the register is used soon. This is not true for pristine registers so use the new addLiveOutsNoPristines() to maintain this behaviour. - SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like a bug, should do the right thing automatically now. - StackMapLivenessAnalysis: Not adding pristine registers looks like a bug to me. Added a FIXME comment but maintain the current behaviour as a change may need to get coordinated with GC runtimes. llvm-svn: 268336
* Cleanup comments. NFC.Chad Rosier2016-05-022-3/+4
| | | | llvm-svn: 268236
* Cleanup comments. NFC.Chad Rosier2016-05-021-4/+3
| | | | llvm-svn: 268235
* [CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in ↵Craig Topper2016-04-281-7/+0
| | | | | | TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior. llvm-svn: 267853
* [AArch64] Expand CTTZ for all vector types.Craig Topper2016-04-281-0/+9
| | | | llvm-svn: 267837
* [AArch64] Set AddPristinesAndCSRs to expandCMP_SWAP LivePhysRegs.Ahmed Bougacha2016-04-271-2/+2
| | | | | | | | | We run after PEI. Found via inspection; no obvious testcase. Follow-up to r266339. llvm-svn: 267780
* [AArch64] Set correct successors in CMPXCHG pseudo expansion.Ahmed Bougacha2016-04-271-2/+4
| | | | | | | | | | | transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas it should be a successor of the original MBB. Follow-up to r266339. Unfortunately, it's tricky to catch this in the verifier. llvm-svn: 267779
* [DAGCombiner] Follow coding convention for function name (NFC)Gerolf Hoflehner2016-04-272-2/+2
| | | | llvm-svn: 267745
* Add parentheses to silence buildbot warningMatthew Simpson2016-04-271-2/+2
| | | | llvm-svn: 267734
* [TTI] Add hook for vector extract with extensionMatthew Simpson2016-04-272-0/+58
| | | | | | | | | | | | | | | This change adds a new hook for estimating the cost of vector extracts followed by zero- and sign-extensions. The motivating example for this change is the SMOV and UMOV instructions on AArch64. These instructions move data from vector to general purpose registers while performing the corresponding extension (sign-extend for SMOV and zero-extend for UMOV) at the same time. For these operations, TargetTransformInfo can assume the extensions are free and only report the cost of the vector extract. The SLP vectorizer has been updated to make use of the new hook. Differential Revision: http://reviews.llvm.org/D18523 llvm-svn: 267725
* [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.Ahmed Bougacha2016-04-261-14/+12
| | | | | | Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606
* [AArch64] Expand v1i64 and v2i64 ctlz.Craig Topper2016-04-261-0/+3
| | | | | | The default is legal, which results in 'Cannot select' errors. llvm-svn: 267522
* Remove MinLatency in SchedMachineModel. NFC.Junmo Park2016-04-262-2/+0
| | | | | | | | | | | Summary: We don't use MinLatency any more since r184032. Reviewers: atrick, hfinkel, mcrosier Differential Revision: http://reviews.llvm.org/D19474 llvm-svn: 267502
* Add optimization bisect opt-in calls for AArch64 passesAndrew Kaylor2016-04-2512-0/+34
| | | | | | Differential Revision: http://reviews.llvm.org/D19394 llvm-svn: 267479
* Re-apply r267206 with a fix for the encoding problem: when the immediate ofQuentin Colombet2016-04-251-3/+14
| | | | | | | | | | | | | | | | | | log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit variant cannot encode it. Therefore, set the subreg part accordingly. [AArch64] Fix optimizeCondBranch logic. The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267465
* Fix typo in comment. NFCNick Lewycky2016-04-241-1/+1
| | | | llvm-svn: 267354
* [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)Gerolf Hoflehner2016-04-244-35/+557
| | | | | | | | | | | | | | | | | | | The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328
* Revert "[AArch64] Fix optimizeCondBranch logic."Renato Golin2016-04-231-5/+5
| | | | | | This reverts commit r267206, as it broke self-hosting on AArch64. llvm-svn: 267294
* [AArch64] Fix optimizeCondBranch logic.Quentin Colombet2016-04-221-5/+5
| | | | | | | | | | | | | The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267206
* [AArch64] When creating MRS instruction, make sure the destination register isQuentin Colombet2016-04-221-2/+1
| | | | | | | | declared as a definition. This fixes the machine verifier error for CodeGen/AArch64/nzcv-save.ll. llvm-svn: 267185
* [AArch64][AdvSIMDScalar] Update the kill flags correctly.Quentin Colombet2016-04-221-30/+45
| | | | | | | | | | | | | | | | | | | We used to simply set the kill flags to true when transforming a scalar instruction to a vector one. SrcScalar1 = copy SrcVector1 ... = opScalar SrcScalar1 => SrcScalar1 = copy SrcVector1 ... = opVector SrcVector1<kill> This is obviously wrong. The proper update consists in: 1. Propagate the kill status from the copy to the new opVector 2. Reset the kill status on the copy, since the live-range of SrcVector1 got extended. This fixes some of the machine verifier errors for AArch64 with make check. llvm-svn: 267180
* Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64Daniel Sanders2016-04-224-557/+35
| | | | | | It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127
* [MachineCombiner] Support for floating-point FMA on ARM64Gerolf Hoflehner2016-04-224-35/+557
| | | | | | | | | | | | | | | | Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098
* [RegisterBankInfo] Change the API for the verify methods.Quentin Colombet2016-04-211-1/+1
| | | | | | | Return bool instead of void so that it is natural to put the calls into asserts. llvm-svn: 267033
* [AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in ↵Evgeny Astigeevich2016-04-212-75/+157
| | | | | | | | | | | | | | | | AArch64InstrInfo::optimizeCompareInstr AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code. A compare instruction is substituted with another instruction which does not produce the same flags as the original compare instruction. This patch contains: 1. Fix of the bug. 2. A regression test in MIR. 3. A new test to check that SUBS is replaced by SUB. Differential Revision: http://reviews.llvm.org/D18838 llvm-svn: 266969
* [AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic.Marcin Koscielnicki2016-04-191-3/+3
| | | | | | | | | | | | | | Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that just return the thread pointer. I have a pending patch that does the same for SystemZ (D19054), and there are many more targets that could benefit from one. This patch merges the ARM and AArch64 intrinsics into a single target independent one that will also be used by subsequent targets. Differential Revision: http://reviews.llvm.org/D19098 llvm-svn: 266818
* [SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARDTim Shen2016-04-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | With this change, ideally IR pass can always generate llvm.stackguard call to get the stack guard; but for now there are still IR form stack guard customizations around (see getIRStackGuard()). Future SSP customization should go through LOAD_STACK_GUARD. There is a behavior change: stack guard values are not CSEed anymore, since we should never reuse the value in case that it has been spilled (and corrupted). See ssp-guard-spill.ll. This also cause the change of stack size and codegen in X86 and AArch64 test cases. Ideally we'd like to know if the guard created in llvm.stackprotector() gets spilled or not. If the value is spilled, discard the value and reload stack guard; otherwise reuse the value. This can be done by teaching register allocator to know how to rematerialize LOAD_STACK_GUARD and force a rematerialization (which seems hard), or check for spilling in expandPostRAPseudo. It only makes sense when the stack guard is a global variable, which requires more instructions to load. Anyway, this seems to go out of the scope of the current patch. llvm-svn: 266806
* [NFC] Header cleanupMehdi Amini2016-04-1810-11/+0
| | | | | | | | | | | | | | Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595
* [AArch64] Add load/store pair instructions to getMemOpBaseRegImmOfsWidth().Chad Rosier2016-04-151-5/+46
| | | | | | | | | This improves AA in the MI schduler when reason about paired instructions. Phabricator Revision: http://reviews.llvm.org/D17098 PR26358 llvm-svn: 266462
* [AArch64] Add MMOs to callee-save load/store instructions.Geoff Berry2016-04-151-2/+15
| | | | | | | | | | | | | | Summary: Without MMOs, the callee-save load/store instructions were treated as volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17661 llvm-svn: 266439
* [MachineScheduler]Add support for store clusteringJun Bum Lim2016-04-152-4/+16
| | | | | | | | | | | | Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437
* Use MVT instead of EVT to remove a bunch of unnecessary calls to getSimpleVT.Craig Topper2016-04-152-53/+52
| | | | llvm-svn: 266414
* [GlobalISel] Move GISelAccessor class into public headersTom Stellard2016-04-144-48/+15
| | | | | | | | | | Reviewers: qcolombet Subscribers: joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19120 llvm-svn: 266348
* [GlobalISel] Coding style and whitespace fixesTom Stellard2016-04-142-6/+6
| | | | | | | | | | Reviewers: qcolombet Subscribers: joker.eph, llvm-commits, vkalintiris Differential Revision: http://reviews.llvm.org/D19119 llvm-svn: 266342
* AArch64: expand cmpxchg after regalloc at -O0.Tim Northover2016-04-144-4/+314
| | | | | | | | | | | | | | | | | | | FastRegAlloc works only at the basic-block level and spills all live-out registers. Unfortunately for a stack-based cmpxchg near the spill slots, this can perpetually clear the exclusive monitor, which means the cmpxchg will never succeed. I believe the only way to handle this within LLVM is by expanding the loop post-regalloc. We don't want this in general because it severely limits the optimisations that can be done, so we limit this to -O0 compilations. It's an ugly hack, and about the one good point in the whole mess is that we can treat all cmpxchg operations in the most naive way possible (seq_cst, no clrex faff) without affecting correctness. Should fix PR25526. llvm-svn: 266339
* TargetLowering: Factor out common code for tail call eligibility checking; NFCMatthias Braun2016-04-141-21/+3
| | | | llvm-svn: 266270
* AArch64: Use a callee save registers for swiftself parametersMatthias Braun2016-04-133-15/+44
| | | | | | | | | | | | | | | | | | It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D19007 llvm-svn: 266251
* [AArch64] Disable LDP/STP for quadsEvandro Menezes2016-04-131-0/+14
| | | | | | | | | Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs of regular LDR/STR. Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>. llvm-svn: 266223
* AArch64: don't create instructions that write to xzr/wzr twice.Tim Northover2016-04-131-0/+8
| | | | | | | | These are unpredictable even on AArch64. Patch by Yichao Yu. llvm-svn: 266206
* [AArch64] Fuse AES{D,E}/AESMC for Exynos M1. (NFC)Evandro Menezes2016-04-121-1/+4
| | | | llvm-svn: 266144
* AArch64: Drive-by cleanupMatthias Braun2016-04-121-3/+2
| | | | llvm-svn: 266035
* Swift Calling Convention: swifterror target support.Manman Ren2016-04-115-6/+70
| | | | | | Differential Revision: http://reviews.llvm.org/D18716 llvm-svn: 265997
* [SSP] Remove llvm.stackprotectorcheck.Tim Shen2016-04-082-3/+3
| | | | | | | | | | This is a cleanup patch for SSP support in LLVM. There is no functional change. llvm.stackprotectorcheck is not needed, because SelectionDAG isn't actually lowering it in SelectBasicBlock; rather, it adds check code in FinishBasicBlock, ignoring the position where the intrinsic is inserted (See FindSplitPointForStackProtector()). llvm-svn: 265851
* [AArch64] Fix a typo in the register class to register bank mapping.Quentin Colombet2016-04-071-1/+1
| | | | | | For GPR family we want the GPR register bank, not FPR! llvm-svn: 265743
* [AArch64] Get rid of some GlobalISel ifdefs.Quentin Colombet2016-04-071-3/+1
| | | | llvm-svn: 265725
* [AArch64] gcc does not like litteral without quotes even on preprocessor macros.Quentin Colombet2016-04-071-1/+1
| | | | llvm-svn: 265720
* [AArch64][CallLowering] Do not build the API if GlobalISel is not built.Quentin Colombet2016-04-072-14/+5
| | | | | | | This gets rid of some ifdefs and dummy implementations that were here just to fill the blanks. llvm-svn: 265719
* [GlobalISel] Add RegBankSelect hooks into the pass pipeline.Quentin Colombet2016-04-071-0/+6
| | | | | | | Now, RegBankSelect will happen after the IRTranslation and the target may optionally add additional passes in between. llvm-svn: 265716
* [RegisterBank] Rename RegisterBank::contains into RegisterBank::covers.Quentin Colombet2016-04-071-4/+4
| | | | llvm-svn: 265695
OpenPOWER on IntegriCloud