summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* [GlobalISel][AArch64] Legalize v8s8 loadsJessica Paquette2019-04-181-0/+25
| | | | | | | | Add legalizer support for loads of v8s8 and update legalize-load-store.mir. Differential Revision: https://reviews.llvm.org/D60877 llvm-svn: 358714
* [GlobalISel] Add legalization support for non-power-2 loads and storesAmara Emerson2019-04-172-20/+49
| | | | | | | | | | Legalize things like i24 load/store by splitting them into smaller power of 2 operations. This matches how SelectionDAG handles these operations. Differential Revision: https://reviews.llvm.org/D59971 llvm-svn: 358613
* Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink ↵Hans Wennborg2019-04-163-65/+65
| | | | | | | | | | | | | | | | | | | function calls without used results (PR41259) The original commit caused false positives from AddressSanitizer's use-after-scope checks, which have now been fixed in r358478. > The code was previously checking that candidates for sinking had exactly > one use or were a store instruction (which can't have uses). This meant > we could sink call instructions only if they had a use. > > That limitation seemed a bit arbitrary, so this patch changes it to > "instruction has zero or one use" which seems more natural and removes > the need to special-case stores. > > Differential revision: https://reviews.llvm.org/D59936 llvm-svn: 358483
* [AArch64][GlobalISel] Don't do extending loads combine for non-pow-2 types.Amara Emerson2019-04-152-1/+38
| | | | | | | | Since non-pow-2 types are going to get split up into multiple loads anyway, don't do the [SZ]EXTLOAD combine for those and save us trouble later in legalization. llvm-svn: 358458
* Codegen: Fixed perf branch_weights in couple of tests. NFC.Yevgeny Rouban2019-04-151-2/+2
| | | | | | This is need to pass future checks of perf branch_weights metadata. llvm-svn: 358384
* [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with ↵Amara Emerson2019-04-1513-65/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | constants only. Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't be degraded. This change also improves the IRTranslator so that in most places, but not all, it creates constants using the MIRBuilder directly instead of first creating a new destination vreg and then creating a constant. By doing this, the buildConstant() method can just return the vreg of an existing G_CONSTANT instead of having to create a COPY from it. I measured a 0.2% improvement in compile time and a 0.9% improvement in code size at -O0 ARM64. Compile time: Program base cse diff test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8% test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7% test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4% test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3% test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2% test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2% test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2% test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1% test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1% test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0% Geomean difference -0.2% Code size: Program base cse diff test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7% test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2% test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1% test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1% test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 8033492 -0.6% test-suite :: CTMark/kimwitu++/kc.test 3870380 3853972 -0.4% test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0% test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0% Geomean difference -0.9% Differential Revision: https://reviews.llvm.org/D60580 llvm-svn: 358369
* [AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and ↵Amara Emerson2019-04-131-0/+32
| | | | | | | | | | | | | fix a crash. This enables the simple copy combine that already exists in the CombinerHelper. However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a set of MIs to process, and so would end up causing a crash when deleted MIs were being added to the combiner worklist again. Differential Revision: https://reviews.llvm.org/D60579 llvm-svn: 358318
* [GlobalISel] Fix a crash when handling an invalid MVT during call lowering.Amara Emerson2019-04-121-0/+7
| | | | | | | | This crash was introduced in r358032 as we try to construct an EVT from an MVT in order to find the register type for the calling conv. Fall back instead of trying to do this with an invalid MVT coming from i256. llvm-svn: 358314
* [AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an ↵Amara Emerson2019-04-121-0/+51
| | | | | | | | | | | undef mask element. If a shufflevector's mask vector has an element with "undef" then the generic instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT. This fixes the selector to handle this case, and for now assumes that undef just means zero. In future we'll optimize this case properly. llvm-svn: 358312
* [AArch64][GlobalISel] Flesh out vector load/store support for more types.Amara Emerson2019-04-115-71/+440
| | | | | | | Some of these were legalizing into smaller vector types unnecessarily, others were simply not supported yet. llvm-svn: 358223
* [AArch64][GlobalISel] Legalization and ISel support for load/stores of ↵Amara Emerson2019-04-115-16/+193
| | | | | | | | | | | | | | | | vectors of pointers. Loads and store of values with type like <2 x p0> currently don't get imported because SelectionDAG has no knowledge of pointer types. To leverage the existing support for vector load/stores, we can bitcast the value to have s64 element types instead. We do this as a custom legalization. This patch also adds support for general loads of <2 x s64>, and relaxes some type conditions on selecting G_BITCAST. Differential Revision: https://reviews.llvm.org/D60534 llvm-svn: 358221
* [AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64Diogo N. Sampaio2019-04-111-0/+10
| | | | | | | | | | | | | | | | Summary: Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64 Reviewers: pbarrio, DavidSpickett, LukeGeeson Reviewed By: LukeGeeson Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60259 llvm-svn: 358171
* [AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal.Amara Emerson2019-04-102-144/+50
| | | | | | The existing isel support already works for p0 once the legalizer accepts it. llvm-svn: 358144
* [AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD.Amara Emerson2019-04-102-0/+107
| | | | llvm-svn: 358143
* [AArch64][GlobalISel] Scalarize vector SDIV.Amara Emerson2019-04-101-4/+33
| | | | llvm-svn: 358142
* GlobalISel: Support legalizing G_CONSTANT with irregular breakdownMatt Arsenault2019-04-101-8/+0
| | | | llvm-svn: 358109
* [AArch64] Teach getTestBitOperand to look through ANY_EXTENDSCraig Topper2019-04-101-5/+10
| | | | | | | | This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression. Differential Revision: https://reviews.llvm.org/D60482 llvm-svn: 358108
* [AArch64] Add lowering pattern for scalar fp16 facge and facgtDiogo N. Sampaio2019-04-101-0/+24
| | | | | | | | | | | | | | | | Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands. Reviewers: olista01, javed.absar, pbarrio Reviewed By: javed.absar Subscribers: kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60212 llvm-svn: 358083
* [AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHLAmara Emerson2019-04-092-0/+3470
| | | | | | | | | | | | | | | | The selection for G_ICMP is unfortunately not currently importable from SDAG due to the use of custom SDNodes. To support this, this selection method has an opcode table which has been generated by a script, indexed by various instruction properties. Ideally in future we will have a GISel native selection patterns that we can write in tablegen to improve on this. For selection of some types we also need support for G_ASHR and G_SHL which are generated as a result of legalization. This patch also adds support for them, generating the same code as SelectionDAG currently does. Differential Revision: https://reviews.llvm.org/D60436 llvm-svn: 358035
* [AArch64][GlobalISel] Legalize vector G_ICMP.Amara Emerson2019-04-093-5/+1929
| | | | | | | | Selection support will be coming in a later patch. Differential Revision: https://reviews.llvm.org/D60435 llvm-svn: 358034
* [AArch64][GlobalISel] Add legalization for some vector G_SHL and G_ASHR.Amara Emerson2019-04-092-1/+32
| | | | | | | | This is needed for some future support for vector ICMP. Differential Revision: https://reviews.llvm.org/D60433 llvm-svn: 358033
* [GlobalISel][AArch64] Allow CallLowering to handle types which are normallyAmara Emerson2019-04-092-0/+44
| | | | | | | | | | | required to be passed as different register types. E.g. <2 x i16> may need to be passed as a larger <2 x i32> type, so formal arg lowering needs to be able truncate it back. Likewise, when dealing with returns of these types, they need to be widened in the appropriate way back. Differential Revision: https://reviews.llvm.org/D60425 llvm-svn: 358032
* [AArch64] Add test case to show missed opportunity to remove a shift before ↵Craig Topper2019-04-091-0/+21
| | | | | | | | tbnz when the shift has been zero extended from i32 to i64. NFC This pattern showed up in D60358 and it was suggested I had a test and fix that separately. llvm-svn: 358030
* [SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCCSimon Pilgrim2019-04-051-7/+10
| | | | | | | | | | Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D60006 llvm-svn: 357765
* Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink ↵David L. Jones2019-04-043-65/+65
| | | | | | | | | | function calls without used results (PR41259)' This revision causes tests to fail under ASAN. Since the cause of the failures is not clear (could be ASAN, could be a Clang bug, could be a bug in this revision), the safest course of action seems to be to revert while investigating. llvm-svn: 357667
* [AArch64][GlobalISel] Legalize G_FEXP2Jessica Paquette2019-04-033-1/+262
| | | | | | | | | Same as G_EXP. Add a test, and update legalizer-info-validation.mir and f16-instructions.ll. Differential Revision: https://reviews.llvm.org/D60165 llvm-svn: 357605
* [DAGCombiner] loosen restrictions for moving shuffles after vector binopSanjay Patel2019-04-031-3/+3
| | | | | | | | | | | | There are 3 changes to make this correspond to the same transform in instcombine: 1. Remove the legality check - we can't create anything less legal than we started with. 2. Ease the use restriction, so we only bail out if both operands have >1 use. 3. Ease the use restriction for binops with a repeated operand (eg, mul x, x). As discussed in D60150, there's a scalarization opportunity that will be made easier by allowing this transform more generally. llvm-svn: 357580
* [GlobalISel] Add IRTranslator support for llvm.stacksave and llvm.stackrestoreJessica Paquette2019-04-021-0/+12
| | | | | | | | Also update arm64-irtranslator.ll. Differential Revision: https://reviews.llvm.org/D60140 llvm-svn: 357538
* [AArch64][GlobalISel] Select llvm.aarch64.stlxr(i64, i64*)Jessica Paquette2019-04-022-0/+36
| | | | | | | | | | | | | This adds partial instruction selection support for llvm.aarch64.stlxr. It also factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The new function removes the restriction that the intrinsic ID on the G_INTRINSIC_W_SIDE_EFFECTS be on operand 0. Also add a test, and add a GISel line to arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D60100 llvm-svn: 357518
* Enforce StackID definition in PEISander de Smalen2019-04-022-0/+80
| | | | | | | | | | | | | | | There are various places in LLVM where the definition of StackID is not properly honoured, for example in PEI where objects with a StackID > 0 are allocated on the default stack (StackID0). This patch enforces that PEI only considers allocating objects to StackID 0. Reviewers: arsenm, thegameg, MatzeB Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60062 llvm-svn: 357460
* SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without ↵Hans Wennborg2019-04-023-65/+65
| | | | | | | | | | | | | | | | used results (PR41259) The code was previously checking that candidates for sinking had exactly one use or were a store instruction (which can't have uses). This meant we could sink call instructions only if they had a use. That limitation seemed a bit arbitrary, so this patch changes it to "instruction has zero or one use" which seems more natural and removes the need to special-case stores. Differential revision: https://reviews.llvm.org/D59936 llvm-svn: 357452
* [AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizingJessica Paquette2019-04-012-4/+24
| | | | | | | | | This improves selection for vector stores into v2s64s. Before we just scalarized them, but we can just use a STRQui instead. Differential Revision: https://reviews.llvm.org/D60083 llvm-svn: 357432
* [GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32sJessica Paquette2019-03-292-0/+108
| | | | | | | | | This adds support for v2s32 vector inserts, and updates the selection + regbankselect tests for G_INSERT_VECTOR_ELT. Differential Revision: https://reviews.llvm.org/D59910 llvm-svn: 357318
* [AArch64] Regenerate half precision testsSimon Pilgrim2019-03-291-24/+51
| | | | | | Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357286
* Switch lowering: exploit unreachable fall-through when lowering case range ↵Hans Wennborg2019-03-291-9/+7
| | | | | | | | | | | | | | | | | | | | cluster In the example below, we would previously emit two range checks, one for cases 1--3 and one for 4--6. This patch makes us exploit the fact that the fall-through is unreachable and only one range check is necessary. switch i32 %i, label %default [ i32 1, label %bb1 i32 2, label %bb1 i32 3, label %bb1 i32 4, label %bb2 i32 5, label %bb2 i32 6, label %bb2 ] default: unreachable llvm-svn: 357252
* Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes."Nirav Dave2019-03-271-1/+2
| | | | | | | This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116
* [GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead ↵Amara Emerson2019-03-272-13/+82
| | | | | | | | | | | | | | | | | | | | instructions. The artifact combiners push instructions which have been marked for deletion onto an list for the legalizer to deal with on return. However, for trunc(ext) combines the combiner routine recursively calls itself. When it does this the dead instructions list may not be empty, and the other combiners don't expect to be dealing with essentially invalid MIR (multiple vreg defs etc). This change fixes it by ensuring that the dead instructions are processed on entry into tryCombineInstruction. As a result, this fix exposed a few places in tests where G_TRUNC instructions were not being deleted even though they were dead. Differential Revision: https://reviews.llvm.org/D59892 llvm-svn: 357101
* Re-commit r355490 "[CodeGen] Omit range checks from jump tables when ↵Hans Wennborg2019-03-271-0/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | lowering switches with unreachable default" Original commit by Ayonam Ray. This commit adds a regression test for the issue discovered in the previous commit: that the range check for the jump table can only be omitted if the fall-through destination of the jump table is unreachable, which isn't necessarily true just because the default of the switch is unreachable. This addresses the missing optimization in PR41242. > During the lowering of a switch that would result in the generation of a > jump table, a range check is performed before indexing into the jump > table, for the switch value being outside the jump table range and a > conditional branch is inserted to jump to the default block. In case the > default block is unreachable, this conditional jump can be omitted. This > patch implements omitting this conditional branch for unreachable > defaults. > > Differential Revision: https://reviews.llvm.org/D52002 > Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 357067
* [DAG] Avoid smart constructor-based dangling nodes.Nirav Dave2019-03-261-2/+1
| | | | | | | | | | | | | | | Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations or not fully pruning unused result values. This can result in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter as the current node deleter to make sure such nodes have the chance of being pruned. Many minor changes, mostly positive. llvm-svn: 356996
* [AArch64] Prefer "mov" over "orr" to materialize constants.Eli Friedman2019-03-2561-264/+262
| | | | | | | | | | | | | This is generally more readable due to the way the assembler aliases work. (This causes a lot of test changes, but it's not really as scary as it looks at first glance; it's just mechanically changing a bunch of checks for orr to check for mov instead.) Differential Revision: https://reviews.llvm.org/D59720 llvm-svn: 356954
* [SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCCSimon Pilgrim2019-03-251-6/+2
| | | | | | | | | | First half of PR40800, this patch adds DAG undef handling to icmp instructions to match the behaviour in llvm::ConstantFoldCompareInstruction and SimplifyICmpInst, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involved a lot of tweaking to reduced tests as bugpoint loves to reduce icmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D59363 llvm-svn: 356938
* [AArch64, ARM] Add support for Exynos M5Evandro Menezes2019-03-226-0/+6
| | | | | | Add Exynos M5 support and test cases. llvm-svn: 356793
* [AArch64] Split the neon.addp intrinsic into integer and fp variants.Amara Emerson2019-03-215-45/+22
| | | | | | | | | | | | | | | | | | | This is the result of discussions on the list about how to deal with intrinsics which require codegen to disambiguate them via only the integer/fp overloads. It causes problems for GlobalISel as some of that information is lost during translation, while with other operations like IR instructions the information is encoded into the instruction opcode. This patch changes clang to emit the new faddp intrinsic if the vector operands to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to upgrade existing calls to aarch64.neon.addp with fp vector arguments, and we remove the workarounds introduced for GlobalISel in r355865. This is a more permanent solution to PR40968. Differential Revision: https://reviews.llvm.org/D59655 llvm-svn: 356722
* GlobalISel: Fix RegBankSelect for REG_SEQUENCEMatt Arsenault2019-03-211-7/+4
| | | | | | | | | | | | | The AArch64 test was broken since the result register already had a set register class, so this test was a no-op. The mapping verify call would fail because the result size is not the same as the inputs like in a copy or phi. The AMDGPU testcases are half broken and introduce illegal VGPR->SGPR copies which need much more work to handle correctly (same for phis), but add them as a baseline. llvm-svn: 356713
* [AArch64] Allow -mattr=tpidr-el[1|2|3]Oliver Stannard2019-03-211-0/+9
| | | | | | | | | | | Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base register, rather than the default TPIDR_EL0. Patch by Philip Derrin! Differential revision: https://reviews.llvm.org/D54685 llvm-svn: 356657
* [AArch64][GlobalISel] Add an optimization to select vector DUP instructions.Amara Emerson2019-03-191-0/+110
| | | | | | | | | This adds pattern matching for the insert+shufflevector sequence so we can generate dup instructions instead of the current TBL sequence. Differential Revision: https://reviews.llvm.org/D59558 llvm-svn: 356526
* [AArch64][GlobalISel] Make v4s32 G_IMPLICIT_DEF legal.Amara Emerson2019-03-191-6/+2
| | | | llvm-svn: 356525
* RegAllocFast: Remove early selection loop, the spill calculation will report ↵Matt Arsenault2019-03-195-46/+47
| | | | | | | | | | | | | | | | cost 0 anyway for free regs The 2nd loop calculates spill costs but reports free registers as cost 0 anyway, so there is little benefit from having a separate early loop. Surprisingly this is not NFC, as many register are marked regDisabled so the first loop often picks up later registers unnecessarily instead of the first one available in the allocation order... Patch by Matthias Braun llvm-svn: 356499
* [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in ↵Simon Pilgrim2019-03-191-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SelectionDAGBuilder::visitSelect These changes are related to PR37743 and include: SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node. Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner. Add promoting the integer ABS node in the LegalizeIntegerType. Expand-based legalization of integer result for the ABS nodes. Expand-based legalization of ABS vector operations. Add some integer abs testcases for different typesizes for Thumb arch Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to: tmp = (SRA, Hi, 31) Lo = (UADDO tmp, Lo) Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1)) Lo = (XOR tmp, Lo) The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern: (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))). Change integer abs testcases for codegen with the ABS node support for AArch64. Indicate that the ABS is legal for the i64 type when the NEON is supported. Change the integer abs testcases to show changing of codegen. Add combine and legalization of ABS nodes for Thumb arch. Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition. For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743 Patch by: @ikulagin (Ivan Kulagin) Differential Revision: https://reviews.llvm.org/D49837 llvm-svn: 356468
* [AArch64] Small fix for getIntImmCostAdhemerval Zanella2019-03-181-0/+106
| | | | | | | | | | | It uses the generic AArch64_IMM::expandMOVImm to get the correct number of instruction used in immediate materialization. Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58461 llvm-svn: 356391
OpenPOWER on IntegriCloud