summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Disable narrow load merge by defaultJun Bum Lim2016-05-201-3/+3
| | | | | | | | | | | | | | Summary: As this optimization converts two loads into one load with two shift instructions, it could potentially hurt performance if a loop is arithmetic operation intensive. Reviewers: t.p.northover, mcrosier, jmolloy Subscribers: evandro, jmolloy, aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20172 llvm-svn: 270251
* [ARM, AArch64] Match additional patterns to ldN instructionsMatthew Simpson2016-05-192-0/+98
| | | | | | | | | | | | | | When matching an interleaved load to an ldN pattern, the interleaved access pass checks that all users of the load are shuffles. If the load is used by an instruction other than a shuffle, the pass gives up and an ldN is not generated. This patch considers users of the load that are extractelement instructions. It attempts to modify the extracts to use one of the available shuffles rather than the load. After the transformation, the load is only used by shuffles and will then be matched with an ldN pattern. Differential Revision: http://reviews.llvm.org/D20250 llvm-svn: 270142
* [AArch64 ] Generate a BFXIL from 'or (and X, Mask0Imm),(and Y, Mask1Imm)'.Chad Rosier2016-05-191-0/+67
| | | | | | | | | | | | | | | | | | Mask0Imm and ~Mask1Imm must be equivalent and one of the MaskImms is a shifted mask (e.g., 0x000ffff0). Both 'and's must have a single use. This changes code like: and w8, w0, #0xffff000f and w9, w1, #0x0000fff0 orr w0, w9, w8 into lsr w8, w1, #4 bfi w0, w8, #4, #12 llvm-svn: 270063
* Fix an assert in SelectionDAGBuilder when processing inline asmRenato Golin2016-05-176-6/+6
| | | | | | | | | | | | | | When processing inline asm that contains errors, make sure we can recover gracefully by creating an UNDEF SDValue for the inline asm statement before returning from SelectionDAGBuilder::visitInlineAsm. This is necessary for consumers that don't exit on the first error that is emitted (e.g. clang) and that would assert later on. Fixes PR24071. Patch by Diana Picus. llvm-svn: 269811
* [llc] New diagnostic handlerRenato Golin2016-05-166-6/+6
| | | | | | | | | | | | | | | | | | | | | | Without a diagnostic handler installed, llc's behaviour is to exit on the first error that it encounters. This is very different from the behaviour of clang and other front ends, which try to gather as many errors as possible before exiting. This commit adds a diagnostic handler to llc, allowing it to find and report more than one error. The old behaviour is preserved under a flag (-exit-on-error). Some of the tests fail with the new diagnostic handler, so they have to use the new flag in order to run under the previous behaviour. Some of these are known bugs, others need further investigation. Ideally, we should fix the tests and remove the flag at some point in the future. Reapplied after fixing the LLDB build that was broken due to the new DiagnosticSeverity in LLVMContext.h, and fixed an UB in the new change. Patch by Diana Picus. llvm-svn: 269655
* Revert "[llc] New diagnostic handler"Renato Golin2016-05-146-6/+6
| | | | | | | | | | | | This reverts commit r269563. Even though now it passes all LLDB bots after a local fix, there's a new buildbot it fails with tests that we hadn't seen locally: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/15647 Adding those tests to the list to investigate. llvm-svn: 269568
* [llc] New diagnostic handlerRenato Golin2016-05-146-6/+6
| | | | | | | | | | | | | | | | | | | | | | Without a diagnostic handler installed, llc's behaviour is to exit on the first error that it encounters. This is very different from the behaviour of clang and other front ends, which try to gather as many errors as possible before exiting. This commit adds a diagnostic handler to llc, allowing it to find and report more than one error. The old behaviour is preserved under a flag (-exit-on-error). Some of the tests fail with the new diagnostic handler, so they have to use the new flag in order to run under the previous behaviour. Some of these are known bugs, others need further investigation. Ideally, we should fix the tests and remove the flag at some point in the future. Reapplied after fixing the LLDB build that was broken due to the new DiagnosticSeverity in LLVMContext.h. Patch by Diana Picus. llvm-svn: 269563
* Revert "[ARM,AArch64] NFC. Add extra test cases for bswap lowering."Renato Golin2016-05-131-39/+0
| | | | | | This reverts commit r269425, as it fails on Windows (Thumb only). llvm-svn: 269451
* add support for -print-imm-hex for AArch64Paul Osmialowski2016-05-1339-285/+285
| | | | | | | | | | | | | | | | | | | | | | | | | Most immediates are printed in Aarch64InstPrinter using 'formatImm' macro, but not all of them. Implementation contains following rules: - floating point immediates are always printed as decimal - signed integer immediates are printed depends on flag settings (for negative values 'formatImm' macro prints the value as i.e -0x01 which may be convenient when imm is an address or offset) - logical immediates are always printed as hex - the 64-bit immediate for advSIMD, encoded in "a:b:c:d:e:f:g:h" is always printed as hex - the 64-bit immedaite in exception generation instructions like: brk, dcps1, dcps2, dcps3, hlt, hvc, smc, svc is always printed as hex - the rest of immediates is printed depends on availability of -print-imm-hex Signed-off-by: Maciej Gabka <maciej.gabka@arm.com> Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com> Differential Revision: http://reviews.llvm.org/D16929 llvm-svn: 269446
* Revert "[llc] New diagnostic handler"Renato Golin2016-05-136-6/+6
| | | | | | | | This reverts commit r269428, as it breaks the LLDB build. We need to understand how to change LLDB in the same way as LLC before landing this again. llvm-svn: 269432
* [llc] New diagnostic handlerRenato Golin2016-05-136-6/+6
| | | | | | | | | | | | | | | | | | | Without a diagnostic handler installed, llc's behaviour is to exit on the first error that it encounters. This is very different from the behaviour of clang and other front ends, which try to gather as many errors as possible before exiting. This commit adds a diagnostic handler to llc, allowing it to find and report more than one error. The old behaviour is preserved under a flag (-exit-on-error). Some of the tests fail with the new diagnostic handler, so they have to use the new flag in order to run under the previous behaviour. Some of these are known bugs, others need further investigation. Ideally, we should fix the tests and remove the flag at some point in the future. Patch by Diana Picus. llvm-svn: 269428
* [ARM,AArch64] NFC. Add extra test cases for bswap lowering.Renato Golin2016-05-131-0/+39
| | | | | | | | These tests were sitting in Phab for many months. They're good tests and should be in. Patch by Charlie Turner. llvm-svn: 269425
* [AArch64] Remove command-line option use for testing.Chad Rosier2016-05-121-1/+1
| | | | | | | The EXTR combine has been in tree for over 2 years without complain, so go ahead and remove the option. llvm-svn: 269292
* [SelectionDAG] Attempt to split BITREVERSE vector legalization into BSWAP ↵Simon Pilgrim2016-05-121-0/+1
| | | | | | | | | | | | | | and BITREVERSE stages For BITREVERSE, bit shifting/masking every bit in a vector element is a very lengthy procedure. If the input vector type is a whole multiple of bytes wide then we can split this into a BSWAP shuffle stage (to reverse at the byte level) and then a BITREVERSE stage applied to each byte. Most vector capable targets can efficiently BSWAP using shuffles resulting in a considerable reduction in instructions. With this patch targets would only need to implement a target specific vXi8 BITREVERSE implementation to efficiently reverse most legal vector types. Differential Revision: http://reviews.llvm.org/D19978 llvm-svn: 269290
* [AArch64] Add support for unscaled narrow stores in getUsefulBitsForUse.Chad Rosier2016-05-121-0/+38
| | | | llvm-svn: 269263
* [AArch64] Improve getUsefulBitsForUse for narrow stores.Chad Rosier2016-05-111-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For narrow stores (e.g., strb, srth) we know the upper bits of the register are unused/not useful. In some cases we can use this information to eliminate unnecessary instructions. For example, without this patch we generate (from the 2nd test case): ldr w8, [x0] and w8, w8, #0xfff0 bfxil w8, w2, #16, #4 strh w8, [x1] and after the patch the 'and' is removed: ldr w8, [x0] bfxil w8, w2, #16, #4 strh w8, [x1] ret During the lowering of the bitfield insert instruction the 'and' is eliminated because we know the upper 16-bits that are masked off are unused and the lower 4-bits that are masked off are overwritten by the insert itself. Therefore, the 'and' is unnecessary. Differential Revision: http://reviews.llvm.org/D20175 llvm-svn: 269226
* [AArch64] Fix DAG selection for cmps for fp16 typeWeiming Zhao2016-05-111-0/+12
| | | | | | | | | | | | Summary: When emitting comparison for fp16, in addition to promote the LHS and RHS to fp32, we need to change the VT as well. Reviewers: t.p.northover Subscribers: t.p.northover, aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D19922 llvm-svn: 269151
* AArch64: allow vN to represent 64-bit registers in inline asm.Tim Northover2016-05-101-0/+14
| | | | | | | | Unlike xN/wN, the size of vN is genuinely ambiguous in the assembly, so we should try to infer what was intended from the type. But only down to 64-bits (vN can never represent sN, hN or bN). llvm-svn: 269132
* [AArch64] Implement lowering of the X constraint on AArch64Silviu Baranga2016-05-092-0/+169
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This implements the lowering of the X constraint on AArch64. The default behaviour of the X constraint lowering is to restrict it to "f". This is a problem because the "f" constraint is not implemented on AArch64 and would be too restrictive anyway. Therefore, the AArch64 hook will lower this to "w" (if the operand is a floating point or vector) or "r" otherwise. The implementation is similar with the one added for ARM (r267411). This is the AArch64 side of the fix for http://llvm.org/PR26493 Reviewers: rengolin Subscribers: aemerson, rengolin, llvm-commits, t.p.northover Differential Revision: http://reviews.llvm.org/D19967 llvm-svn: 268907
* [AArch64] Fix test to specify triple and disable post-RA scheduling.Geoff Berry2016-05-061-1/+1
| | | | | | | This should fix bot breakage caused by r268746: [AArch64] Combine callee-save and local stack SP adjustment instructions. llvm-svn: 268752
* [AArch64] Combine callee-save and local stack SP adjustment instructions.Geoff Berry2016-05-0613-85/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If a function needs to allocate both callee-save stack memory and local stack memory, we currently decrement/increment the SP in two steps: first for the callee-save area, and then for the local stack area. This changes the code to allocate them both at once at the very beginning/end of the function. This has two benefits: 1) there is one fewer sub/add micro-op in the prologue/epilogue 2) the stack adjustment instructions act as a scheduling barrier, so moving them to the very beginning/end of the function increases post-RA scheduler's ability to move instructions (that only depend on argument registers) before any of the callee-save stores This change can cause an increase in instructions if the original local stack SP decrement could be folded into the first store to the stack. This occurs when the first local stack store is to stack offset 0. In this case we are trading off one more sub instruction for one fewer sub micro-op (along with benefits (2) and (3) above). Reviewers: t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18619 llvm-svn: 268746
* [CodeGen] Round [SU]INT_TO_FP result when promoting from f16.Ahmed Bougacha2016-05-061-0/+28
| | | | | | | | | | | | If we don't, values that aren't precisely representable in f16 could be used as-is in a promoted f32 operation, which would produce incorrect results. AArch64 had the correct behavior; add a focused test. Fixes http://llvm.org/PR26871 llvm-svn: 268700
* [ValueTracking] Improve isImpliedCondition for matching LHS and Imm RHSs.Chad Rosier2016-05-052-7/+9
| | | | llvm-svn: 268636
* [AArch64] Use the reciprocal estimation machineryEvandro Menezes2016-05-042-0/+237
| | | | | | | This patch adds support for estimating the square root, its reciprocal and division or reciprocal using the combiner generic reciprocal machinery. llvm-svn: 268539
* Fix uppercase typoMatthias Braun2016-05-031-1/+1
| | | | llvm-svn: 268362
* AArch64/optimizeCondBranch: Remove earlier kill flag when forming TBZMatthias Braun2016-05-031-0/+48
| | | | | | | This fixes -verify-machineinstrs complaints when compiling test-suite/SingleSource/Benchmarks/Shootout-C++/wordfreq.cpp llvm-svn: 268360
* [AArch64] Set correct successors in CMPXCHG pseudo expansion.Ahmed Bougacha2016-04-271-1/+1
| | | | | | | | | | | transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas it should be a successor of the original MBB. Follow-up to r266339. Unfortunately, it's tricky to catch this in the verifier. llvm-svn: 267779
* [AArch64] Expand v1i64 and v2i64 ctlz.Craig Topper2016-04-261-0/+28
| | | | | | The default is legal, which results in 'Cannot select' errors. llvm-svn: 267522
* Re-apply r267206 with a fix for the encoding problem: when the immediate ofQuentin Colombet2016-04-251-5/+52
| | | | | | | | | | | | | | | | | | log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit variant cannot encode it. Therefore, set the subreg part accordingly. [AArch64] Fix optimizeCondBranch logic. The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267465
* [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)Gerolf Hoflehner2016-04-242-0/+264
| | | | | | | | | | | | | | | | | | | The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328
* Revert "[AArch64] Fix optimizeCondBranch logic."Renato Golin2016-04-231-1/+1
| | | | | | This reverts commit r267206, as it broke self-hosting on AArch64. llvm-svn: 267294
* [AArch64] Fix optimizeCondBranch logic.Quentin Colombet2016-04-221-1/+1
| | | | | | | | | | | | | The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267206
* MachineScheduler: Limit the size of the ready list.Matthias Braun2016-04-221-0/+1
| | | | | | | | | Avoid quadratic complexity in unusually large basic blocks by limiting the size of the ready lists. Differential Revision: http://reviews.llvm.org/D19349 llvm-svn: 267189
* [AArch64] When creating MRS instruction, make sure the destination register isQuentin Colombet2016-04-221-1/+1
| | | | | | | | declared as a definition. This fixes the machine verifier error for CodeGen/AArch64/nzcv-save.ll. llvm-svn: 267185
* [AArch64][AdvSIMDScalar] Update the kill flags correctly.Quentin Colombet2016-04-221-4/+4
| | | | | | | | | | | | | | | | | | | We used to simply set the kill flags to true when transforming a scalar instruction to a vector one. SrcScalar1 = copy SrcVector1 ... = opScalar SrcScalar1 => SrcScalar1 = copy SrcVector1 ... = opVector SrcVector1<kill> This is obviously wrong. The proper update consists in: 1. Propagate the kill status from the copy to the new opVector 2. Reset the kill status on the copy, since the live-range of SrcVector1 got extended. This fixes some of the machine verifier errors for AArch64 with make check. llvm-svn: 267180
* Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64Daniel Sanders2016-04-222-264/+0
| | | | | | It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127
* [MachineCombiner] Support for floating-point FMA on ARM64Gerolf Hoflehner2016-04-222-0/+264
| | | | | | | | | | | | | | | | Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098
* Try to fix UNRESOLVED: LLVM :: CodeGen/AArch64/arm64-regress-opt-cmp.s on bots.Nico Weber2016-04-221-0/+1
| | | | | | | | This test used to write a .s file until r266971 fixed that. But on most bots, the .s file still exists. Add an rm statement to clean up the bots. In a few days, this statement can go away again. llvm-svn: 267095
* add tests for disguised fabs/fnegSanjay Patel2016-04-211-0/+29
| | | | llvm-svn: 267053
* DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit componentMatt Arsenault2016-04-211-1/+1
| | | | | | | If the extracted bits are restricted to the upper half or lower half, this can be truncated. llvm-svn: 267024
* Updated a test not to produce an empty s-file.Evgeny Astigeevich2016-04-211-1/+1
| | | | llvm-svn: 266971
* [AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in ↵Evgeny Astigeevich2016-04-212-0/+64
| | | | | | | | | | | | | | | | AArch64InstrInfo::optimizeCompareInstr AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code. A compare instruction is substituted with another instruction which does not produce the same flags as the original compare instruction. This patch contains: 1. Fix of the bug. 2. A regression test in MIR. 3. A new test to check that SUBS is replaced by SUB. Differential Revision: http://reviews.llvm.org/D18838 llvm-svn: 266969
* [LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.Mandeep Singh Grang2016-04-1915-15/+15
| | | | | | | | | | | | Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834
* [AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic.Marcin Koscielnicki2016-04-191-2/+2
| | | | | | | | | | | | | | Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that just return the thread pointer. I have a pending patch that does the same for SystemZ (D19054), and there are many more targets that could benefit from one. This patch merges the ARM and AArch64 intrinsics into a single target independent one that will also be used by subsequent targets. Differential Revision: http://reviews.llvm.org/D19098 llvm-svn: 266818
* [SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARDTim Shen2016-04-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | With this change, ideally IR pass can always generate llvm.stackguard call to get the stack guard; but for now there are still IR form stack guard customizations around (see getIRStackGuard()). Future SSP customization should go through LOAD_STACK_GUARD. There is a behavior change: stack guard values are not CSEed anymore, since we should never reuse the value in case that it has been spilled (and corrupted). See ssp-guard-spill.ll. This also cause the change of stack size and codegen in X86 and AArch64 test cases. Ideally we'd like to know if the guard created in llvm.stackprotector() gets spilled or not. If the value is spilled, discard the value and reload stack guard; otherwise reuse the value. This can be done by teaching register allocator to know how to rematerialize LOAD_STACK_GUARD and force a rematerialization (which seems hard), or check for spilling in expandPostRAPseudo. It only makes sense when the stack guard is a global variable, which requires more instructions to load. Anyway, this seems to go out of the scope of the current patch. llvm-svn: 266806
* [AArch64] Add load/store pair instructions to getMemOpBaseRegImmOfsWidth().Chad Rosier2016-04-152-1/+35
| | | | | | | | | This improves AA in the MI schduler when reason about paired instructions. Phabricator Revision: http://reviews.llvm.org/D17098 PR26358 llvm-svn: 266462
* Fix test to require Asserts since it uses debug output.Geoff Berry2016-04-151-0/+1
| | | | llvm-svn: 266448
* [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.Adrian Prantl2016-04-152-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
* llvm/test/CodeGen/AArch64/arm64-csldst-mmo.ll requires +Asserts.NAKAMURA Takumi2016-04-151-0/+1
| | | | llvm-svn: 266443
* [AArch64] Add MMOs to callee-save load/store instructions.Geoff Berry2016-04-151-0/+23
| | | | | | | | | | | | | | Summary: Without MMOs, the callee-save load/store instructions were treated as volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17661 llvm-svn: 266439
OpenPOWER on IntegriCloud