summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AArch64
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64] Allow loads with imp-def to be handled in getMemOpBaseRegImmOfsWidth()Jun Bum Lim2016-03-311-1/+1
| | | | | | | | | | | | | | Summary: This change will allow loads with imp-def to be clustered in machine-scheduler pass. areMemAccessesTriviallyDisjoint() can also handle loads with imp-def. Reviewers: mcrosier, jmolloy, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18665 llvm-svn: 265051
* [AArch64] Handle missing store pair opportunityJun Bum Lim2016-03-311-8/+22
| | | | | | | | | | | | | | | | | | | | Summary: This change will handle missing store pair opportunity where the first store instruction stores zero followed by the non-zero store. For example, this change will convert : str wzr, [x8] str w1, [x8, #4] into: stp wzr, w1, [x8] Reviewers: jmolloy, t.p.northover, mcrosier Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18570 llvm-svn: 265021
* Upgrade some wildly anachronistic debug info in testcases.Adrian Prantl2016-03-291-3/+3
| | | | llvm-svn: 264797
* Swift Calling Convention: add swiftself attribute.Manman Ren2016-03-291-0/+29
| | | | | | Differential Revision: http://reviews.llvm.org/D17866 llvm-svn: 264754
* [AArch64] Do not lower scalar sdiv/udiv to a shifts + mul sequence when ↵Haicheng Wu2016-03-281-0/+45
| | | | | | | | optimizing for minsize Mimic what x86 does when optimizing sdiv/udiv for minsize. llvm-svn: 264606
* Prevent construction of cycle in DAG store mergeNirav Dave2016-03-251-0/+41
| | | | | | | | | | | | | | | | | | | | | When merging stores in DAGCombiner, add check to ensure that no dependenices exist that would cause the construction of a cycle in our DAG. This may happen if one store has a data dependence on another instruction (e.g. a load) which itself has a (chain) dependence on another store being merged. These stores cannot be merged safely and doing so results in a cycle that is discovered in LegalizeDAG. This test is only done in cases where Antialias analysis is used (UseAA) as non-AA store merge candidates will be merged logically after all loads which have been checked to not alias. Reviewers: ahatanak, spatel, niravd, arsenm, hfinkel, tstellarAMD, jyknight Subscribers: llvm-commits, tberghammer, danalbert, srhines Differential Revision: http://reviews.llvm.org/D18336 llvm-svn: 264461
* [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.Manman Ren2016-03-181-0/+13
| | | | | | | Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller and callee have mismatched calling conventions. llvm-svn: 263856
* [CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86.Manman Ren2016-03-181-0/+106
| | | | | | | Since at O0, explicit copies via SplitCSR may not be removed even if they are unnecessary, we choose not to use SplitCSR at O0. llvm-svn: 263855
* [AArch64] Enable more load clustering in the MI Scheduler.Chad Rosier2016-03-181-0/+99
| | | | | | | | | | | | | This patch adds unscaled loads and sign-extend loads to the TII getMemOpBaseRegImmOfs API, which is used to control clustering in the MI scheduler. This is done to create more opportunities for load pairing. I've also added the scaled LDRSWui instruction, which was missing from the scaled instructions. Finally, I've added support in shouldClusterLoads for clustering adjacent sext and zext loads that too can be paired by the load/store optimizer. Differential Revision: http://reviews.llvm.org/D18048 llvm-svn: 263819
* [AArch64] Move GlobalISel test cases into a GlobalISel subdirectoryQuentin Colombet2016-03-151-0/+0
| | | | llvm-svn: 263572
* [AArch64] Break the dependency between FP and SP when possible.Chad Rosier2016-03-144-11/+11
| | | | | | | | | | | | | When the SP in not changed because of realignment/VLAs etc., we restore the SP by using the previous value of SP and not the FP. Breaking the dependency will help in cases when the epilog of a callee is close to the epilog of the caller; for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of the callee. http://reviews.llvm.org/D18060 Patch by Aditya Kumar and Evandro Menezes. llvm-svn: 263458
* [AArch64] Don't blindly lower f16/f128 FCCMPs.Ahmed Bougacha2016-03-111-0/+52
| | | | | | | | | Instead, extend f16 (like we do when lowering a standalone SETCC), and let f128 be legalized to the RT calls. Fixes PR26803. llvm-svn: 263301
* [IRTranslator] Translate unconditional branches.Quentin Colombet2016-03-111-1/+23
| | | | llvm-svn: 263265
* AArch64: only try to use scaled fcvt ops on legal vector types.Tim Northover2016-03-101-0/+8
| | | | | | | Before we ended up calling getSimpleVectorType on a <3 x float>, which asserted. llvm-svn: 263169
* Fix testicase to turn buildbot green. NFC.Balaram Makam2016-03-102-3/+3
| | | | llvm-svn: 263154
* [AArch64] Optimize compare and branch sequence when the compare's constant ↵Balaram Makam2016-03-102-1/+52
| | | | | | | | | | | | | | | | | | | | | | | operand is power of 2 Summary: Peephole optimization that generates a single TBZ/TBNZ instruction for test and branch sequences like in the example below. This handles the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC Examples: and w8, w8, #0x400 cbnz w8, L1 to tbnz w8, #10, L1 Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17942 llvm-svn: 263136
* Add support for a preserve_most calling convention to the AArch64 backend.Roman Levenstein2016-03-101-0/+40
| | | | | | | | | | This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64. There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention. Differential Revision: http://reviews.llvm.org/D18016 llvm-svn: 263092
* [AArch64] Disable the MI scheduler to turn bots green after r262942.Chad Rosier2016-03-081-4/+4
| | | | llvm-svn: 262944
* [AArch64][GlobalISel] Add a test case for the IRTranslator.Quentin Colombet2016-03-081-0/+18
| | | | llvm-svn: 262898
* [AArch64] fold 'isPositive' vector integer operations (PR26819)Sanjay Patel2016-03-031-15/+7
| | | | | | | | | | | | | | | | This is one of the cases shown in: https://llvm.org/bugs/show_bug.cgi?id=26819 Shift and negate is what InstCombine prefers to produce (and I tried to make it do more of that in http://reviews.llvm.org/rL262424 ), so we should recognize that pattern as something that might come from autovectorization even if it's unlikely to be produced from C NEON intrinsics. The patch is based on the x86 equivalent: http://reviews.llvm.org/rL262036 Differential Revision: http://reviews.llvm.org/D17834 llvm-svn: 262623
* Making rem_crash.ll target-specificRenato Golin2016-03-031-0/+257
| | | | | | | | | | This test failed in some ARM bots after a divmod change because it was running on a native llc, instead of targeted one. This makes sure the test is target-specific (as intended), and also copies to ARM and AArch64 directories. If it is also supposed to work on other architectures, I'll leave as an exercise to the respective maintainers. llvm-svn: 262620
* [AArch64] add tests to demonstrate existing codegen for PR26819Sanjay Patel2016-03-021-0/+65
| | | | llvm-svn: 262540
* [AArch64] Enable non-leaf frame pointer elimination.Geoff Berry2016-03-0214-46/+42
| | | | | | | | | | | | | | | | | | | | Summary: This change enables frame pointer elimination in non-leaf functions. The -fomit-frame-pointer option still needs to be used when compiling via clang (or an equivalent method of not setting the 'no-frame-pointer-elim*' function attributes if generating llvm IR via some other method) to take advantage of this optimization. This change should be NFC when compiling via clang without -fomit-frame-pointer. Reviewers: t.p.northover Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines Differential Revision: http://reviews.llvm.org/D17730 llvm-svn: 262495
* Revert "[AArch64] Fix isLegalAddImmediate() to return true for valid ↵Geoff Berry2016-03-011-46/+0
| | | | | | | | | | negative values." Revert r262248 in an attempt to fix the clang-native-aarch64-full bot and to investigate a performance regression in SingleSource/Benchmarks/CoyoteBench/huffbench llvm-svn: 262388
* [AArch64] Fix isLegalAddImmediate() to return true for valid negative values.Geoff Berry2016-02-291-0/+46
| | | | | | | | | | Reviewers: t.p.northover, jmolloy Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D17463 llvm-svn: 262248
* Fix a bug in isVectorReductionOp() in SelectionDAGBuilder.cpp that may cause ↵Cong Hou2016-02-261-0/+52
| | | | | | assertion failure on AArch64. llvm-svn: 262091
* Reapply r262054 with triple fix.Paul Robinson2016-02-261-15/+15
| | | | llvm-svn: 262069
* Revert r262054 on one file that fails sometimes.Paul Robinson2016-02-261-14/+14
| | | | llvm-svn: 262060
* Fix tests that used CHECK-NEXT-NOT and CHECK-DAG-NOT.Paul Robinson2016-02-262-16/+18
| | | | | | | | FileCheck actually doesn't support combo suffixes. Differential Revision: http://reviews.llvm.org/D17588 llvm-svn: 262054
* [CodeGenPrepare] Remove load-based heuristicJunmo Park2016-02-251-2/+3
| | | | | | | | | | | | | | Summary: Both the hardware and LLVM have changed since 2012. Now, load-based heuristic don't show big differences any more on OoO cores. There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5). Reviewers: spatel, zansari Differential Revision: http://reviews.llvm.org/D16836 llvm-svn: 261809
* [AArch64] Generate csinv instruction more oftenGeoff Berry2016-02-232-64/+93
| | | | | | | | | | Reviewers: t.p.northover, jmolloy Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17546 llvm-svn: 261675
* [AArch64] Fix fastcc -tailcallopt epilog code generation.Geoff Berry2016-02-231-0/+92
| | | | | | | | | | | | | | Summary: Fix a bug in epilog generation where the incoming stack arguments were not being popped for fastcc functions when -tailcallopt was passed. Reviewers: t.p.northover, mcrosier, jmolloy, rengolin Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16894 llvm-svn: 261650
* [AArch64][ShrinkWrap] Fix bug in prolog clobbering live reg when shrink ↵Geoff Berry2016-02-191-0/+89
| | | | | | | | | | | | | | wrapping. Summary: See bug https://llvm.org/bugs/show_bug.cgi?id=26642 Reviewers: qcolombet, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17350 llvm-svn: 261349
* LegalizeDAG: Fix ExpandFCOPYSIGN assuming the same type on both inputsMatthias Braun2016-02-191-0/+23
| | | | llvm-svn: 261306
* Bug fix: use dyn_cast_or_null instead of dyn_castLawrence Hu2016-02-191-0/+23
| | | | | | Differential Revision: http://reviews.llvm.org/D17154 llvm-svn: 261299
* When printing MIR, output to errs() rather than outs().Justin Lebar2016-02-192-3/+3
| | | | | | | | | | | | | | | | | | | | | Summary: Without this, this command $ llvm-run llc -stop-after machine-cp -o - <( echo '' ) outputs an error, because we close stdout twice -- once when closing the file opened for "-o", and again when closing outs(). Also clarify in the outs() definition that you can't ever call it if you want to open your own raw_fd_ostream on stdout. Reviewers: jroelofs, tstellarAMD Subscribers: jholewinski, qcolombet, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17422 llvm-svn: 261286
* AArch64: always clear kill flags up to last eliminated copyTim Northover2016-02-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | After r261154, we were only clearing flags if the known-zero register was originally live-in to the basic block, but we have to do it even if not when more than one COPY has been eliminated, otherwise the user of the first COPY may still have <kill> marked. E.g. BB#N: %X0 = COPY %XZR STRXui %X0<kill>, <fi#0> %X0 = COPY %XZR STRXui %X0<kill>, <fi#1> We can eliminate both copies, X0 is not live-in, but we must clear the kill on the first store. Unfortunately, I've been unable to come up with a non-fragile test for this. I've only seen it in the wild with regalloc-created spills, and attempts to reproduce that in a reasonable way run afoul of COPY coalescing. Even volatile asm clobbers were moved around. Should fix the aarch64 bot though. llvm-svn: 261175
* AArch64: improve redundant copy elimination.Tim Northover2016-02-171-4/+23
| | | | | | | | | | | | | | | Mostly, this fixes the bug that if the CBZ guaranteed Xn but Wn was used, we didn't sort out the use-def chain properly. I've also made it check more than just the last instruction for a compatible CBZ (so it can cope without fallthroughs). I'd have liked to do that separately, but it's helps writing the test. Finally, I removed some custom loops in favour of MachineInstr helpers and refactored the control flow to flatten it and avoid possibly quadratic iterations in blocks with many copies. NFC for these, just a general tidy-up. llvm-svn: 261154
* [AArch64] Add pass to remove redundant copy after RAJun Bum Lim2016-02-161-0/+75
| | | | | | | | | | | | | | | | | | | | | Summary: This change will add a pass to remove unnecessary zero copies in target blocks of cbz/cbnz instructions. E.g., the copy instruction in the code below can be removed because the cbz jumps to BB1 when x0 is zero : BB0: cbz x0, .BB1 BB1: mov x0, xzr Jun Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin Differential Revision: http://reviews.llvm.org/D16203 llvm-svn: 261004
* [AArch64] Reduce number of callee-save save/restores.Geoff Berry2016-02-124-19/+226
| | | | | | | | | | | | | | | | | | | | | Summary: Before this change, callee-save registers would be rounded up to even pairs of GPRs and FPRs. This change eliminates these extra padding load/stores, though it does keep the stack allocation the same size unless both the GPR and FPR sets have an odd size, in which case one full pair stack slot (16 bytes) is saved. This optimization cannot currently be done for MachO targets since they rely on a fast-path .debug_frame equivalent that can only encode callee-save registers as pairs. Reviewers: t.p.northover, rengolin, mcrosier, jmolloy Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17000 llvm-svn: 260689
* [AArch64] Add support for Qualcomm Kryo CPU.Chad Rosier2016-02-123-0/+3
| | | | | | Machine model description by Dave Estes <cestes@codeaurora.org>. llvm-svn: 260686
* [AArch64] Merge two adjacent str WZR into str XZRJun Bum Lim2016-02-121-0/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change merges adjacent 32 bit zero stores into a 64 bit zero store. e.g., str wzr, [x0] str wzr, [x0, #4] becomes str xzr, [x0] Therefore, four adjacent 32 bit zero stores will be a single stp. e.g., str wzr, [x0] str wzr, [x0, #4] str wzr, [x0, #8] str wzr, [x0, #12] becomes stp xzr, xzr, [x0] Reviewers: mcrosier, jmolloy, gberry, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16933 llvm-svn: 260682
* [AArch64] Improve load/store optimizer to handle LDUR + LDR.Chad Rosier2016-02-111-0/+125
| | | | | | | | | | | | | This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. This is a reapplication of r259812, which had an incorrect assert. The test_stur_str_no_assert() test is a reduced version of the issue hit in the AArch64 self-host. PR24465 llvm-svn: 260523
* [AArch64] AArch64LoadStoreOptimizer: fix bug in pre-inc check iteratorGeoff Berry2016-02-091-2/+2
| | | | | | | | | | | | | | | Summary: Fix case where a pre-inc/dec load/store would not be formed if the add/sub that forms the inc/dec part of the operation was the first instruction in the block being examined. Reviewers: mcrosier, jmolloy, t.p.northover, junbuml Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16785 llvm-svn: 260275
* AArch64: match correct order in subtraction pattern.Tim Northover2016-02-081-1/+11
| | | | | | | The accumulator in multiply-and-subtract instructions is actually subtracted *from* so these patterns were computing the wrong value. llvm-svn: 260131
* SelectionDAG: Lower some range metadata to AssertZextMatt Arsenault2016-02-081-0/+44
| | | | | | | | | | If a range has a lower bound of 0, add an AssertZext from the nearest floor power of two. This allows operations with some workitem intrinsics with known maximum ranges to use fast 24-bit multiplies. llvm-svn: 260109
* Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)."Renato Golin2016-02-051-105/+0
| | | | | | This reverts commit r259812 as it broke AArch64 self-hosting. llvm-svn: 259881
* [AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).Chad Rosier2016-02-041-0/+105
| | | | | | | | | | | | | | | This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769 and r259790. The tramp3d failure was caused by an incorrect refactoring in the patch. Specifically, we weren't always properly clearing the SExtIdx flag. llvm-svn: 259812
* [AArch64] Multiply extended 32-bit ints with `[U|S]MADDL'Silviu Baranga2016-02-041-0/+52
| | | | | | | | | | | | | | | | | | | | | | | During instruction selection, the AArch64 backend can recognise the following pattern and generate an [U|S]MADDL instruction, i.e. a multiply of two 32-bit operands with a 64-bit result: (mul (sext i32), (sext i32)) However, when one of the operands is constant, the sign extension gets folded into the constant in SelectionDAG::getNode(). This means that the instruction selection sees this: (mul (sext i32), i64) ...which doesn't match the pattern. Sign-extension and 64-bit multiply instructions are generated, which are slower than one 32-bit multiply. Add a pattern to match this and generate the correct instruction, for both signed and unsigned multiplies. Patch by Chris Diamand! llvm-svn: 259800
* Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."Chad Rosier2016-02-041-105/+0
| | | | | | This reverts commit r259790. tramp3d-v4 is still having problems. llvm-svn: 259795
OpenPOWER on IntegriCloud