summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* Add tests for shrink wrapping and VLAsMomchil Velikov2018-04-181-0/+95
| | | | | | Differential revision: https://reviews.llvm.org/D45727 llvm-svn: 330253
* MachO: trap unreachable instructionsTim Northover2018-04-131-0/+7
| | | | | | | Debugability is more important than saving 4 bytes to let us to fall through to nonense. llvm-svn: 330073
* Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into ↵Peter Collingbourne2018-04-136-172/+68
| | | | | | | | | | addresses." Caused a hang and eventually an assertion failure in LTO builds of 7zip-benchmark on aarch64 iOS targets. http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/ llvm-svn: 330063
* AArch64: Introduce a DAG combine for folding offsets into addresses.Peter Collingbourne2018-04-126-68/+172
| | | | | | | | | | | This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. This reduces the size of Chromium for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329956
* [AArch64] Move AFI->setRedZone(false) to top of emitPrologueJessica Paquette2018-04-121-5/+35
| | | | | | | | | | | | | AFI->setRedZone(false) was put in the wrong place before, and so it only fired on functions that didn't have stack frames. This moves that to the top of emitPrologue to make sure that every function without a redzone has it set correctly. This also adds a function representing one of the early exit cases (GHC calling convention) to the MachineOutliner noredzone test to ensure that we can outline from functions like these, where we never use a redzone. llvm-svn: 329922
* revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617)Sanjay Patel2018-04-121-4/+8
| | | | | | | This change is exposing UB in source code - as was warned/predicted. :) See D44909 for discussion. Reverting while we figure out how to fix things. llvm-svn: 329920
* [NFC] fix trivial typos in documents and commentsHiroshi Inoue2018-04-121-1/+1
| | | | | | "is is" -> "is", "if if" -> "if", "or or" -> "or" llvm-svn: 329878
* [FastISel] Disable local value sinking by defaultReid Kleckner2018-04-117-13/+13
| | | | | | | | | | | | | | | | | | This is causing compilation timeouts on code with long sequences of local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out that code coverage instrumentation is a great way to create sequences like this, which how our users ran into the issue in practice. Intel has a tool that detects these kinds of non-linear compile time issues, and Andy Kaylor reported it as PR37010. The current sinking code scans the whole basic block once per local value sink, which happens before emitting each call. In theory, local values should only be introduced to be used by instructions between the current flush point and the last flush point, so we should only need to scan those instructions. llvm-svn: 329822
* [AArch64] Add test case for r329797Francis Visoiu Mistrih2018-04-111-0/+17
| | | | | | Forgot to add a test case in the previous commit. llvm-svn: 329805
* [AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass.Geoff Berry2018-04-101-0/+25
| | | | | | | | | | | | | | | | Summary: When inserting MOVs to avoid Falkor HWPF collisions, the non-base register operand of load instructions (e.g. a register offset) was not being considered live, so it could potentially have been used as a scratch register, clobbering the actual offset value. Reviewers: mcrosier Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45502 llvm-svn: 329761
* [AArch64] Fix isel failure when BUILD_PAIR nodes are left over.Amara Emerson2018-04-101-0/+13
| | | | | | rdar://39175175 llvm-svn: 329743
* Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF."Peter Collingbourne2018-04-1011-191/+127
| | | | | | Caused a build failure in check-tsan. llvm-svn: 329718
* [AArch64] Use FP to access the emergency spill slotFrancis Visoiu Mistrih2018-04-102-2/+23
| | | | | | | | | | | | | | | | | | | | | In the presence of variable-sized stack objects, we always picked the base pointer when resolving frame indices if it was available. This makes us hit an assert where we can't reach the emergency spill slot if it's too far away from the base pointer. Since on AArch64 we decide to place the emergency spill slot at the top of the frame, it makes more sense to use FP to access it. The changes here don't affect only emergency spill slots but all the frame indices. The goal here is to try to choose between FP, BP and SP so that we minimize the offset and avoid scavenging, or worse, asserting when trying to access a slot allocated by the scavenger. Previously discussed here: https://reviews.llvm.org/D40876. Differential Revision: https://reviews.llvm.org/D45358 llvm-svn: 329691
* AArch64: Allow offsets to be folded into addresses with ELF.Peter Collingbourne2018-04-0911-127/+191
| | | | | | | | | | | | | | | This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. It reduces the size of Chromium for Android's .text section by 46KB, or 56KB with ThinLTO (which exposes more opportunities to use a direct access rather than a GOT access). Because the addend range is limited in COFF and Mach-O, this is enabled for ELF only. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329611
* Remove MachineLoopInfo dependency from AsmPrinter.Michael Zolotukhin2018-04-093-9/+1
| | | | | | | | | | | | | | | | | | | Summary: Currently MachineLoopInfo is used in only two places: 1) for computing IsBasicBlockInsideInnermostLoop field of MCCodePaddingContext, and it is never used. 2) in emitBasicBlockLoopComments, which is called only if `isVerbose()` is true. Despite that, we currently have a dependency on MachineLoopInfo, which makes pass manager to compute it and MachineDominator Tree. This patch removes the use (1) and makes the use (2) lazy, thus avoiding some redundant recomputations. Reviewers: opaparo, gadi.haber, rafael, craig.topper, zvi Subscribers: rengolin, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D44812 llvm-svn: 329542
* [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))Guozhi Wei2018-04-071-0/+14
| | | | | | | | | | | | | | In our real world application, we found the following optimization is missed in DAGCombiner (zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst)) If the user of original zext is an add, it may enable further lea optimization on x86. This patch add a new function CombineZExtLogicopShiftLoad to do this optimization. Differential Revision: https://reviews.llvm.org/D44402 llvm-svn: 329516
* Reapply ARM: Do not spill CSR to stack on entry to noreturn functionsTim Northover2018-04-071-2/+2
| | | | | | | | | | | | | | | | | | Should fix UBSan bot by also checking there's no "uwtable" attribute before skipping. Otherwise the unwind table will be useless since its moves expect CSRs to actually be preserved. A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch mostly by myeisha (pmb). llvm-svn: 329494
* Revert "ARM: Do not spill CSR to stack on entry to noreturn functions"Vitaly Buka2018-04-071-2/+2
| | | | | | | | Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM This reverts commit r329287 llvm-svn: 329486
* ARM: Do not spill CSR to stack on entry to noreturn functionsTim Northover2018-04-051-2/+2
| | | | | | | | | | | | | | A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch by myeisha (pmb). llvm-svn: 329287
* AArch64: Implement support for the shadowcallstack attribute.Peter Collingbourne2018-04-041-0/+47
| | | | | | | | | | | | The implementation of shadow call stack on aarch64 is quite different to the implementation on x86_64. Instead of reserving a segment register for the shadow call stack, we reserve the platform register, x18. Any function that spills lr to sp also spills it to the shadow call stack, a pointer to which is stored in x18. Differential Revision: https://reviews.llvm.org/D45239 llvm-svn: 329236
* [AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y)John Brawn2018-04-043-0/+106
| | | | | | Differential Revision: https://reviews.llvm.org/D44573 llvm-svn: 329163
* [MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfoJessica Paquette2018-04-031-0/+47
| | | | | | | | | | | | This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It returns true if the function is known to use a redzone, false if it is known to not use a redzone, and no value otherwise. This removes the requirement to pass -mno-red-zone when outlining for AArch64. https://reviews.llvm.org/D45189 llvm-svn: 329120
* [MachineOutliner][NFC] Make outlined functions have internal linkageJessica Paquette2018-04-031-14/+14
| | | | | | | | | | | | The linkage type on outlined functions was private before. This meant that if you set a breakpoint in an outlined function, the debugger wouldn't be able to give a sane name to the outlined function. This commit changes the linkage type to internal and updates any tests that relied on the prefixes on the names of outlined functions. llvm-svn: 329116
* [AArch64] Reserve x18 register on FuchsiaPetr Hosek2018-04-013-6/+3
| | | | | | | | This register is reserved as a platform register on Fuchsia. Differential Revision: https://reviews.llvm.org/D45105 llvm-svn: 328950
* [DAGCombine] (float)((int) f) --> ftrunc (PR36617)Sanjay Patel2018-03-311-8/+4
| | | | | | | | | | fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 328921
* [SelectionDAG] Removing FABS folding from DAGCombinerSanjay Patel2018-03-301-4/+9
| | | | | | | | | | | | | | The code has bugs dealing with -0.0. Since D44550 introduced FABS pattern folding in InstCombine, this patch removes the now-redundant code that causes https://bugs.llvm.org/show_bug.cgi?id=36600. Patch by Mikhail Dvoretckii! Differential Revision: https://reviews.llvm.org/D44683 llvm-svn: 328872
* [MachineCopyPropagation] Handle COPY with overlapping source/dest.Eli Friedman2018-03-301-0/+32
| | | | | | | | | | | | | | | | | | | MachineCopyPropagation::CopyPropagateBlock has a bunch of special handling for COPY instructions. This handling assumes that COPY instructions do not modify the source of the copy; this is wrong if the COPY destination overlaps the source. To fix the bug, check explicitly for this situation, and fall back to the generic instruction handling. This bug can't happen for most register classes because they don't have this sort of overlap, but there are a few register classes where this is possible. The testcase uses the AArch64 QQQQ register class. Differential Revision: https://reviews.llvm.org/D44911 llvm-svn: 328851
* [MachineOutliner] Simplify call outlining + require valid callee save info ↵Jessica Paquette2018-03-282-2/+2
| | | | | | | | | | for call outlining This commit simplifies the call outlining logic by removing references to the Function associated with the callee. To do this, it requires that valid callee save info is available to the outliner. llvm-svn: 328719
* [AArch64] add ftrunc tests; NFCSanjay Patel2018-03-281-0/+47
| | | | | | As suggested in D44909. llvm-svn: 328683
* [MachineOutliner] AArch64: Don't outline ADRPs with un-outlinable operandsJessica Paquette2018-03-271-0/+41
| | | | | | | | | If an ADRP appears with, say, a CPI operand, we shouldn't outline it. This moves the check for unsafe operands so that it occurs before the special-case for ADRPs. Also add a test for outlining ADRPs. llvm-svn: 328674
* Fix a reoccuring typo in load-combine testsArtur Pilipenko2018-03-271-2/+2
| | | | | | | | | | | %tmp = bitcast i32* %arg to i8* %tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0 - %tmp2 = load i8, i8* %tmp, align 1 + %tmp2 = load i8, i8* %tmp1, align 1 This doesn't change the semantics of the tests but makes use of %tmp1 which was originally intended. llvm-svn: 328642
* Use .set instead of = when printing assignment in assembly outputKrzysztof Parzyszek2018-03-273-16/+16
| | | | | | | | | On Hexagon "x = y" is a syntax used in most instructions, and is not treated as a directive. Differential Revision: https://reviews.llvm.org/D44256 llvm-svn: 328635
* [AArch64] Don't reduce the width of loads if it prevents combining a shiftJohn Brawn2018-03-232-2/+307
| | | | | | | | | | | | | | | Loads and stores can only shift the offset register by the size of the value being loaded, but currently the DAGCombiner will reduce the width of the load if it's followed by a trunc making it impossible to later combine the shift. Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and make it prevent the width reduction if this is what would happen, though do allow it if reducing the load width will let us eliminate a later sign or zero extend. Differential Revision: https://reviews.llvm.org/D44794 llvm-svn: 328321
* [GlobalISel] Fix legalizer combine to not use illegal input G_EXTRACT.Amara Emerson2018-03-231-3/+3
| | | | | | | This was being masked because GISel is enabled by default for -O0 and the abort was disabled. Modified test to explicitly enable abort. llvm-svn: 328311
* State that CFG is preserved in 'Falkor HW Prefetch Fix Late Phase'.Michael Zolotukhin2018-03-221-2/+0
| | | | | | That removes some redundant recomputations from the passes pipeline. llvm-svn: 328272
* Reapply "[test] Add tests for llc passes pipelines." with a fix for bots ↵Michael Zolotukhin2018-03-222-0/+240
| | | | | | with expensive checks on. llvm-svn: 328267
* [CodeGen] Add a new pass for PostRA sinkJun Bum Lim2018-03-222-0/+387
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This pass sinks COPY instructions into a successor block, if the COPY is not used in the current block and the COPY is live-in to a single successor (i.e., doesn't require the COPY to be duplicated). This avoids executing the the copy on paths where their results aren't needed. This also exposes additional opportunites for dead copy elimination and shrink wrapping. These copies were either not handled by or are inserted after the MachineSink pass. As an example of the former case, the MachineSink pass cannot sink COPY instructions with allocatable source registers; for AArch64 these type of copy instructions are frequently used to move function parameters (PhyReg) into virtual registers in the entry block.. For the machine IR below, this pass will sink %w19 in the entry into its successor (%bb.1) because %w19 is only live-in in %bb.1. ``` %bb.0: %wzr = SUBSWri %w1, 1 %w19 = COPY %w0 Bcc 11, %bb.2 %bb.1: Live Ins: %w19 BL @fun %w0 = ADDWrr %w0, %w19 RET %w0 %bb.2: %w0 = COPY %wzr RET %w0 ``` As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be able to see %bb.0 as a candidate. With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted in spec2000/2006/2017 on AArch64. Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz Reviewed By: sebpop Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41463 llvm-svn: 328237
* [GISel]: Fix incorrect IRTranslation while translating null pointer typesAditya Nandakumar2018-03-225-5/+9
| | | | | | | | | | | | | | | https://reviews.llvm.org/D44762 Currently IRTranslator produces %vreg17<def>(p0) = G_CONSTANT 0; instead we should build %vreg16(s64) = G_CONSTANT 0 %vreg17(p0) = G_INTTOPTR %vreg16 reviewed by @aemerson. llvm-svn: 328218
* Revert "[test] Add tests for llc passes pipelines."Jonas Devlieghere2018-03-222-239/+0
| | | | | | | This reverts r328159 because the two AArch64 tests fail on GreenDragon: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/11030/ llvm-svn: 328188
* [test] Add tests for llc passes pipelines.Michael Zolotukhin2018-03-212-0/+239
| | | | | | | This is basically an extension of existing test test/CodeGen/X86/O0-pipeline.ll introduced in r302608. llvm-svn: 328159
* [AArch64] Add vmulxh_lane fp16 vector intrinsicAbderrazek Zaafrani2018-03-201-0/+30
| | | | | | https://reviews.llvm.org/D44591 llvm-svn: 328035
* [AArch64] add fabs tests for PR36600; NFCSanjay Patel2018-03-201-0/+33
| | | | llvm-svn: 327995
* [MachineOutliner] AArch64: Emit CFI instructions when outlining callsJessica Paquette2018-03-191-0/+67
| | | | | | | | | | | When outlining calls, the outliner needs to update CFI to ensure that, say, exception handling works. This commit adds that functionality and adds a test just for call outlining. Call outlining stuff in machine-outliner.mir should be moved into machine-outliner-calls.mir in a later commit. llvm-svn: 327917
* [ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probesMartin Storsjo2018-03-192-0/+29
| | | | | | | | | | | | | This extends the use of this attribute on ARM and AArch64 from SVN r325900 (where it was only checked for fixed stack allocations on ARM/AArch64, but for all stack allocations on X86). This also adds a testcase for the existing use of disabling the fixed stack probe with the attribute on ARM and AArch64. Differential Revision: https://reviews.llvm.org/D44291 llvm-svn: 327897
* [MachineOutliner] Make KILLs invisibleJessica Paquette2018-03-161-0/+3
| | | | | | | | | At the point the outliner runs, KILLs don't impact anything, but they're still considered unique instructions. This commit makes them invisible like DebugValues so that they can still be outlined without impacting outlining decisions. llvm-svn: 327760
* [AArch64] Codegen tests for the Armv8.2-A FP16 intrinsicsSjoerd Meijer2018-03-157-0/+893
| | | | | | | | | This is a follow up of the AArch64 FP16 intrinsics work; the codegen tests had not been added yet. Differential Revision: https://reviews.llvm.org/D44510 llvm-svn: 327624
* [FastISel] Sink local value materializations to first useReid Kleckner2018-03-147-28/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Local values are constants, global addresses, and stack addresses that can't be folded into the instruction that uses them. For example, when storing the address of a global variable into memory, we need to materialize that address into a register. FastISel doesn't want to materialize any given local value more than once, so it generates all local value materialization code at EmitStartPt, which always dominates the current insertion point. This allows it to maintain a map of local value registers, and it knows that the local value area will always dominate the current insertion point. The downside is that local value instructions are always emitted without a source location. This is done to prevent jumpy line tables, but it means that the local value area will be considered part of the previous statement. Consider this C code: call1(); // line 1 ++global; // line 2 ++global; // line 3 call2(&global, &local); // line 4 Today we end up with assembly and line tables like this: .loc 1 1 callq call1 leaq global(%rip), %rdi leaq local(%rsp), %rsi .loc 1 2 addq $1, global(%rip) .loc 1 3 addq $1, global(%rip) .loc 1 4 callq call2 The LEA instructions in the local value area have no source location and are treated as being on line 1. Stepping through the code in a debugger and correlating it with the assembly won't make much sense, because these materializations are only required for line 4. This is actually problematic for the VS debugger "set next statement" feature, which effectively assumes that there are no registers live across statement boundaries. By sinking the local value code into the statement and fixing up the source location, we can make that feature work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and https://crbug.com/793819. This change is obviously not enough to make this feature work reliably in all cases, but I felt that it was worth doing anyway because it usually generates smaller, more comprehensible -O0 code. I measured a 0.12% regression in code generation time with LLC on the sqlite3 amalgamation, so I think this is worth doing. There are some special cases worth calling out in the commit message: 1. local values materialized for phis 2. local values used by no-op casts 3. dead local value code Local values can be materialized for phis, and this does not show up as a vreg use in MachineRegisterInfo. In this case, if there are no other uses, this patch sinks the value to the first terminator, EH label, or the end of the BB if nothing else exists. Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. Lastly, if the local value register has no other uses, we can delete it. This comes up when fastisel tries two instruction selection approaches and the first materializes the value but fails and the second succeeds without using the local value. Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43093 llvm-svn: 327581
* [CodeGen] Use MIR syntax for MachineMemOperand printingFrancis Visoiu Mistrih2018-03-143-20/+20
| | | | | | | | | | Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)". rdar://38163529 Differential Revision: https://reviews.llvm.org/D42377 llvm-svn: 327580
* [AArch64] Emit CSR loads in the same order as storesFrancis Visoiu Mistrih2018-03-141-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optionally allow the order of restoring the callee-saved registers in the epilogue to be reversed. The flag -reverse-csr-restore-seq generates the following code: ``` stp x26, x25, [sp, #-64]! stp x24, x23, [sp, #16] stp x22, x21, [sp, #32] stp x20, x19, [sp, #48] ; [..] ldp x24, x23, [sp, #16] ldp x22, x21, [sp, #32] ldp x20, x19, [sp, #48] ldp x26, x25, [sp], #64 ret ``` Note how the CSRs are restored in the same order as they are saved. One exception to this rule is the last `ldp`, which allows us to merge the stack adjustment and the ldp into a post-index ldp. This is done by first generating: ldp x26, x27, [sp] add sp, sp, #64 which gets merged by the arm64 load store optimizer into ldp x26, x25, [sp], #64 The flag is disabled by default. llvm-svn: 327569
* [AArch64] Keep track of MIFlags in the LoadStoreOptimizerFrancis Visoiu Mistrih2018-03-141-0/+99
| | | | | | | | | | | | | | | | Merging: * $x26, $x25 = frame-setup LDPXi $sp, 0 * $sp = frame-destroy ADDXri $sp, 64, 0 into an LDPXpost should preserve the flags from both instructions as following: * frame-setup frame-destroy LDPXpost Differential Revision: https://reviews.llvm.org/D44446 llvm-svn: 327533
OpenPOWER on IntegriCloud