summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Set correct successors in CMPXCHG pseudo expansion.Ahmed Bougacha2016-04-271-1/+1
| | | | | | | | | | | transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas it should be a successor of the original MBB. Follow-up to r266339. Unfortunately, it's tricky to catch this in the verifier. llvm-svn: 267779
* [AArch64] Expand v1i64 and v2i64 ctlz.Craig Topper2016-04-261-0/+28
| | | | | | The default is legal, which results in 'Cannot select' errors. llvm-svn: 267522
* Re-apply r267206 with a fix for the encoding problem: when the immediate ofQuentin Colombet2016-04-251-5/+52
| | | | | | | | | | | | | | | | | | log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit variant cannot encode it. Therefore, set the subreg part accordingly. [AArch64] Fix optimizeCondBranch logic. The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267465
* [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)Gerolf Hoflehner2016-04-242-0/+264
| | | | | | | | | | | | | | | | | | | The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328
* Revert "[AArch64] Fix optimizeCondBranch logic."Renato Golin2016-04-231-1/+1
| | | | | | This reverts commit r267206, as it broke self-hosting on AArch64. llvm-svn: 267294
* [AArch64] Fix optimizeCondBranch logic.Quentin Colombet2016-04-221-1/+1
| | | | | | | | | | | | | The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267206
* MachineScheduler: Limit the size of the ready list.Matthias Braun2016-04-221-0/+1
| | | | | | | | | Avoid quadratic complexity in unusually large basic blocks by limiting the size of the ready lists. Differential Revision: http://reviews.llvm.org/D19349 llvm-svn: 267189
* [AArch64] When creating MRS instruction, make sure the destination register isQuentin Colombet2016-04-221-1/+1
| | | | | | | | declared as a definition. This fixes the machine verifier error for CodeGen/AArch64/nzcv-save.ll. llvm-svn: 267185
* [AArch64][AdvSIMDScalar] Update the kill flags correctly.Quentin Colombet2016-04-221-4/+4
| | | | | | | | | | | | | | | | | | | We used to simply set the kill flags to true when transforming a scalar instruction to a vector one. SrcScalar1 = copy SrcVector1 ... = opScalar SrcScalar1 => SrcScalar1 = copy SrcVector1 ... = opVector SrcVector1<kill> This is obviously wrong. The proper update consists in: 1. Propagate the kill status from the copy to the new opVector 2. Reset the kill status on the copy, since the live-range of SrcVector1 got extended. This fixes some of the machine verifier errors for AArch64 with make check. llvm-svn: 267180
* Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64Daniel Sanders2016-04-222-264/+0
| | | | | | It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127
* [MachineCombiner] Support for floating-point FMA on ARM64Gerolf Hoflehner2016-04-222-0/+264
| | | | | | | | | | | | | | | | Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098
* Try to fix UNRESOLVED: LLVM :: CodeGen/AArch64/arm64-regress-opt-cmp.s on bots.Nico Weber2016-04-221-0/+1
| | | | | | | | This test used to write a .s file until r266971 fixed that. But on most bots, the .s file still exists. Add an rm statement to clean up the bots. In a few days, this statement can go away again. llvm-svn: 267095
* add tests for disguised fabs/fnegSanjay Patel2016-04-211-0/+29
| | | | llvm-svn: 267053
* DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit componentMatt Arsenault2016-04-211-1/+1
| | | | | | | If the extracted bits are restricted to the upper half or lower half, this can be truncated. llvm-svn: 267024
* Updated a test not to produce an empty s-file.Evgeny Astigeevich2016-04-211-1/+1
| | | | llvm-svn: 266971
* [AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in ↵Evgeny Astigeevich2016-04-212-0/+64
| | | | | | | | | | | | | | | | AArch64InstrInfo::optimizeCompareInstr AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code. A compare instruction is substituted with another instruction which does not produce the same flags as the original compare instruction. This patch contains: 1. Fix of the bug. 2. A regression test in MIR. 3. A new test to check that SUBS is replaced by SUB. Differential Revision: http://reviews.llvm.org/D18838 llvm-svn: 266969
* [LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.Mandeep Singh Grang2016-04-1915-15/+15
| | | | | | | | | | | | Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834
* [AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic.Marcin Koscielnicki2016-04-191-2/+2
| | | | | | | | | | | | | | Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that just return the thread pointer. I have a pending patch that does the same for SystemZ (D19054), and there are many more targets that could benefit from one. This patch merges the ARM and AArch64 intrinsics into a single target independent one that will also be used by subsequent targets. Differential Revision: http://reviews.llvm.org/D19098 llvm-svn: 266818
* [SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARDTim Shen2016-04-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | With this change, ideally IR pass can always generate llvm.stackguard call to get the stack guard; but for now there are still IR form stack guard customizations around (see getIRStackGuard()). Future SSP customization should go through LOAD_STACK_GUARD. There is a behavior change: stack guard values are not CSEed anymore, since we should never reuse the value in case that it has been spilled (and corrupted). See ssp-guard-spill.ll. This also cause the change of stack size and codegen in X86 and AArch64 test cases. Ideally we'd like to know if the guard created in llvm.stackprotector() gets spilled or not. If the value is spilled, discard the value and reload stack guard; otherwise reuse the value. This can be done by teaching register allocator to know how to rematerialize LOAD_STACK_GUARD and force a rematerialization (which seems hard), or check for spilling in expandPostRAPseudo. It only makes sense when the stack guard is a global variable, which requires more instructions to load. Anyway, this seems to go out of the scope of the current patch. llvm-svn: 266806
* [AArch64] Add load/store pair instructions to getMemOpBaseRegImmOfsWidth().Chad Rosier2016-04-152-1/+35
| | | | | | | | | This improves AA in the MI schduler when reason about paired instructions. Phabricator Revision: http://reviews.llvm.org/D17098 PR26358 llvm-svn: 266462
* Fix test to require Asserts since it uses debug output.Geoff Berry2016-04-151-0/+1
| | | | llvm-svn: 266448
* [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.Adrian Prantl2016-04-152-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
* llvm/test/CodeGen/AArch64/arm64-csldst-mmo.ll requires +Asserts.NAKAMURA Takumi2016-04-151-0/+1
| | | | llvm-svn: 266443
* [AArch64] Add MMOs to callee-save load/store instructions.Geoff Berry2016-04-151-0/+23
| | | | | | | | | | | | | | Summary: Without MMOs, the callee-save load/store instructions were treated as volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17661 llvm-svn: 266439
* [MachineScheduler]Add support for store clusteringJun Bum Lim2016-04-154-18/+167
| | | | | | | | | | | | Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437
* AArch64: expand cmpxchg after regalloc at -O0.Tim Northover2016-04-141-0/+75
| | | | | | | | | | | | | | | | | | | FastRegAlloc works only at the basic-block level and spills all live-out registers. Unfortunately for a stack-based cmpxchg near the spill slots, this can perpetually clear the exclusive monitor, which means the cmpxchg will never succeed. I believe the only way to handle this within LLVM is by expanding the loop post-regalloc. We don't want this in general because it severely limits the optimisations that can be done, so we limit this to -O0 compilations. It's an ugly hack, and about the one good point in the whole mess is that we can treat all cmpxchg operations in the most naive way possible (seq_cst, no clrex faff) without affecting correctness. Should fix PR25526. llvm-svn: 266339
* AArch64: Use a callee save registers for swiftself parametersMatthias Braun2016-04-131-21/+59
| | | | | | | | | | | | | | | | | | It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D19007 llvm-svn: 266251
* [AArch64] Disable LDP/STP for quadsEvandro Menezes2016-04-132-0/+79
| | | | | | | | | Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs of regular LDR/STR. Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>. llvm-svn: 266223
* Cleanup Store Merging in UseAA caseNirav Dave2016-04-131-0/+64
| | | | | | | | | | | | | | This patch fixes a bug (PR26827) when using anti-aliasing in store merging. This sets the chain users of the component stores to point to the new store instead of the component stores chain parent. Reviewers: jyknight Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18909 llvm-svn: 266217
* AArch64: don't create instructions that write to xzr/wzr twice.Tim Northover2016-04-131-3/+3
| | | | | | | | These are unpredictable even on AArch64. Patch by Yichao Yu. llvm-svn: 266206
* Recommit r265547, and r265610,r265639,r265657 on top of it, plusWei Mi2016-04-131-514/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | two fixes with one about error verify-regalloc reported, and another about live range update of phi after rematerialization. r265547: Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Patches on top of r265547: r265610 "Fix the compare-clang diff error introduced by r265547." r265639 "Fix the sanitizer bootstrap error in r265547." r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]" Differential Revision: http://reviews.llvm.org/D15302 Differential Revision: http://reviews.llvm.org/D18934 Differential Revision: http://reviews.llvm.org/D18935 Differential Revision: http://reviews.llvm.org/D18936 llvm-svn: 266162
* [ScheduleDAGInstrs] Handle instructions with multiple MMOsGeoff Berry2016-04-121-0/+23
| | | | | | | | | | | | | | | | | | | | | Summary: In getUnderlyingObjectsForInstr(): Don't give up on instructions with multiple MMOs, instead look through all the MMOs and if they all meet the conservative criteria previously used for single MMO instructions, then return all of the underlying objects derived from the MMOs. The change to ScheduleDAGInstrs::buildSchedGraph() is needed to avoid the case where multiple underlying objects are present and are related in such a way that successive iterations of the loop end up adding a dependency from an instruction to itself. Reviewers: atrick, hfinkel Subscribers: MatzeB, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18093 llvm-svn: 266084
* [AArch64] Add test cases for the repairing of physical registers.Quentin Colombet2016-04-121-0/+48
| | | | llvm-svn: 266030
* [AArch64] Add a test case for the propagation of register banks throughQuentin Colombet2016-04-121-0/+49
| | | | | | phis. llvm-svn: 266028
* [AArch64] Add a test case for the repairing of definitions.Quentin Colombet2016-04-121-0/+28
| | | | llvm-svn: 266026
* [AArch64] Test that RegBankSelect inserts the proper copies to fix theQuentin Colombet2016-04-121-0/+53
| | | | | | register bank assignments. llvm-svn: 266021
* swifterror: fix up a testing case.Manman Ren2016-04-111-5/+5
| | | | llvm-svn: 266000
* Swift Calling Convention: swifterror target support.Manman Ren2016-04-111-0/+385
| | | | | | Differential Revision: http://reviews.llvm.org/D18716 llvm-svn: 265997
* [AArch64] Add a test case for the default mapping of RegBankSelect.Quentin Colombet2016-04-081-0/+49
| | | | llvm-svn: 265811
* Revert r265547 "Recommit r265309 after fixed an invalid memory reference bug ↵Hans Wennborg2016-04-081-0/+514
| | | | | | | | | | | | | happened" It caused PR27275: "ARM: Bad machine code: Using an undefined physical register" Also reverting the following commits that were landed on top: r265610 "Fix the compare-clang diff error introduced by r265547." r265639 "Fix the sanitizer bootstrap error in r265547." r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]" llvm-svn: 265790
* Recommit r265309 after fixed an invalid memory reference bug happenedWei Mi2016-04-061-514/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when DenseMap growed and moved memory. I verified it fixed the bootstrap problem on x86_64-linux-gnu but I cannot verify whether it fixes the bootstrap error on clang-ppc64be-linux. I will watch the build-bot result closely. Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Differential Revision: http://reviews.llvm.org/D15302 llvm-svn: 265547
* Faster stack-protector for Android/AArch64.Evgeniy Stepanov2016-04-051-0/+19
| | | | | | | Bionic has a defined thread-local location for the stack protector cookie. Emit a direct load instead of going through __stack_chk_guard. llvm-svn: 265481
* [AArch64][Test] Do not override the suffixes for test cases.Quentin Colombet2016-04-052-3/+1
| | | | llvm-svn: 265441
* Don't delete empty preheaders in CodeGenPrepare if it would create a ↵Chuang-Yu Cheng2016-04-051-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical edge Presently, CodeGenPrepare deletes all nearly empty (only phi and branch) basic blocks. This pass can delete loop preheaders which frequently creates critical edges. A preheader can be a convenient place to spill registers to the stack. If the entrance to a loop body is a critical edge, then spills may occur in the loop body rather than immediately before it. This patch protects loop preheaders from deletion in CodeGenPrepare even if they are nearly empty. Since the patch alters the CFG, it affects a large number of test cases. In most cases, the changes are merely cosmetic (basic blocks have different names or instruction orders change slightly). I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert. The patch also partially subsumes D16893 from George Burgess IV. George correctly notes that CodeGenPrepare does not actually preserve the dominator tree. I think the dominator tree was usually not valid when CodeGenPrepare ran, but I am using LoopInfo to mark preheaders, so the dominator tree is now always valid before CodeGenPrepare. Author: Tom Jablin (tjablin) Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng http://reviews.llvm.org/D16984 llvm-svn: 265397
* ARM, AArch64, X86: Check preserved registers for tail calls.Matthias Braun2016-04-041-0/+24
| | | | | | | | | | | | | | | | We can only perform a tail call to a callee that preserves all the registers that the caller needs to preserve. This situation happens with calling conventions like preserver_mostcc or cxx_fast_tls. It was explicitely handled for fast_tls and failing for preserve_most. This patch generalizes the check to any calling convention. Related to rdar://24207743 Differential Revision: http://reviews.llvm.org/D18680 llvm-svn: 265329
* Revert r265309 and r265312 because they caused some errors I need to ↵Wei Mi2016-04-041-0/+514
| | | | | | investigate. llvm-svn: 265317
* Replace analyzeSiblingValues with new algorithm to fix its compileWei Mi2016-04-041-514/+0
| | | | | | | | | | | | | | | | | | | | | | | | | time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Differential Revision: http://reviews.llvm.org/D15302 llvm-svn: 265309
* AArch64: avoid clobbering SP for dead MOVimm pseudos.Tim Northover2016-04-011-0/+46
| | | | | | | | We were producing ORR, which actually defines a GPR32sp rather than a GPR32. Should fix PR23209. llvm-svn: 265198
* testcase gardening: update the emissionKind enum to the new syntax. (NFC)Adrian Prantl2016-04-012-2/+2
| | | | llvm-svn: 265081
* Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.Adrian Prantl2016-03-311-1/+1
| | | | | | | | | | | | | This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder into DICompileUnit. DIBuilder is not the right place for this enum to live in — a metadata consumer should not have to include DIBuilder.h. I also added a Verifier check that checks that the emission kind of a DICompileUnit is actually legal. http://reviews.llvm.org/D18612 <rdar://problem/25427165> llvm-svn: 265077
OpenPOWER on IntegriCloud