summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM64
Commit message (Collapse)AuthorAgeFilesLines
* AArch64/ARM64: move ARM64 into AArch64's placeTim Northover2014-05-24284-54263/+0
| | | | | | | | | | | | | | | This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577
* ARM64: extract a 32-bit subreg when selecting an inreg extendTim Northover2014-05-241-2/+135
| | | | | | | | After the load/store refactoring, we were sometimes trying to feed a GPR64 into a 32-bit register offset operand. This failed in copyPhysReg. llvm-svn: 209566
* Revert part of "Fix broken FileCheck prefixes"Nico Rieck2014-05-232-11/+11
| | | | | | This reverts part of commit r209538. llvm-svn: 209544
* Fix broken FileCheck prefixesNico Rieck2014-05-233-12/+12
| | | | llvm-svn: 209538
* [ARM64] Fix a bug in shuffle vector lowering to generate corect vext ISD ↵Jiangning Liu2014-05-231-0/+172
| | | | | | with swapped input vectors. llvm-svn: 209495
* Test comment commit.Dave Estes2014-05-211-3/+2
| | | | llvm-svn: 209306
* Fix test added in r209242: llc shouldn't create files in source treeAlexey Samsonov2014-05-201-1/+1
| | | | llvm-svn: 209252
* [ARM64] PR19792: Fix cycle in DAG after performPostLD1CombineAdam Nemet2014-05-201-0/+40
| | | | | | | | | | | | | | | | | | | Povray and dealII currently assert with "Overran sorted position" in AssignTopologicalOrder. The problem is that performPostLD1Combine can introduce cycles. Consider: (insert_vector_elt (INSERT_SUBREG undef, (load (add %vreg0, Constant<8>), undef), <= A TargetConstant<2>), (load %vreg0, undef), <= B Constant<1>) This is turned into a LD1LANEpost node. However the address in A is not a valid user of the post-incremented address of B in LD1LANEpost. llvm-svn: 209242
* [ARM64] Adds Cortex-A53 scheduling support for vector load/store post.Chad Rosier2014-05-191-0/+15
| | | | | | | Patch by Dave Estes<cestes@codeaurora.org>! PR19761 http://reviews.llvm.org/D3829 llvm-svn: 209176
* [ARM64] Split tbz/tbnz into W/X register variantBradley Smith2014-05-191-2/+2
| | | | llvm-svn: 209134
* SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the ↵Benjamin Kramer2014-05-191-0/+11
| | | | | | | | | | bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123
* [ARM64] Increases the Sched Model accuracy for Cortex-A53.Chad Rosier2014-05-162-2/+21
| | | | | | | Patch by Dave Estes <cestes@codeaurora.org> http://reviews.llvm.org/D3769 llvm-svn: 209001
* Revert "Implement global merge optimization for global variables."Rafael Espindola2014-05-161-88/+0
| | | | | | | | | | | | This reverts commit r208934. The patch depends on aliases to GEPs with non zero offsets. That is not supported and fairly broken. The good news is that GlobalAlias is being redesigned and will have support for offsets, so this patch should be a nice match for it. llvm-svn: 208978
* TableGen: fix operand counting for aliasesTim Northover2014-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | | TableGen has a fairly dubious heuristic to decide whether an alias should be printed: does the alias have lest operands than the real instruction. This is bad enough (particularly with no way to override it), but it should at least be calculated consistently for both strings. This patch implements that logic: first get the *correct* string for the variant, in the same way as the Matcher, without guessing; then count the number of whitespace chars. There are basically 4 changes this brings about after the previous commits; all of these appear to be good, so I have changed the tests: + ARM64: we print "neg X, Y" instead of "sub X, xzr, Y". + ARM64: we skip implicit "uxtx" and "uxtw" modifiers. + Sparc: we print "mov A, B" instead of "or %g0, A, B". + Sparc: we print "fcmpX A, B" instead of "fcmpX %fcc0, A, B" llvm-svn: 208969
* [ARM64]Implement NEON post-increment LD1(lane) and post-increment LD1R.Hao Liu2014-05-161-1/+485
| | | | llvm-svn: 208955
* Implement global merge optimization for global variables.Jiangning Liu2014-05-151-0/+88
| | | | | | | | | | | This commit implements two command line switches -global-merge-on-external and -global-merge-aligned, and both of them are false by default, so this optimization is disabled by default for all targets. For ARM64, some back-end behaviors need to be tuned to get this optimization further enabled. llvm-svn: 208934
* ARM64: print correct aliases for NEON mov & mvn instructionsTim Northover2014-05-154-24/+24
| | | | | | | | In all cases, if a "mov" alias exists, it is the canonical form of the instruction. Now that TableGen can support aliases containing syntax variants, we can enable them and improve the quality of the asm output. llvm-svn: 208874
* TableGen/ARM64: print aliases even if they have syntax variants.Tim Northover2014-05-153-29/+29
| | | | | | | To get at least one use of the change (and some actual tests) in with its commit, I've enabled the AArch64 & ARM64 NEON mov aliases. llvm-svn: 208867
* [ARM64-BE] Fix byte order of CIE and FDE frames for exception handlingChristian Pirker2014-05-141-0/+73
| | | | | | Reviewed at http://reviews.llvm.org/D3741 llvm-svn: 208792
* Folding into CSEL when there is ZEXT between SETCC and ADDWeiming Zhao2014-05-131-0/+11
| | | | | | | | | | | Normally, patterns like (add x, (setcc cc ...)) will be folded into (csel x, x+1, not cc). However, if there is a ZEXT after SETCC, they won't be folded. This patch recognizes the ZEXT and allows the generation of CSINC. This patch fixes bug 19680. llvm-svn: 208660
* [DAGCombiner] Split up an indexed load if only the base pointer value is liveAdam Nemet2014-05-121-0/+29
| | | | | | | | | | | | | | | | | | | | Right now the load may not get DCE'd because of the side-effect of updating the base pointer. This can happen if we lower a read-modify-write of an illegal larger type (e.g. i48) such that the modification only affects one of the subparts (the lower i32 part but not the higher i16 part). See the testcase. In order to spot the dead load we need to revisit it when SimplifyDemandedBits decided that the value of the load is masked off. This is the CommitTargetLoweringOpt piece. I checked compile time with ARM64 by sending SPEC bitcode files through llc. No measurable change. Fixes <rdar://problem/16031651> llvm-svn: 208640
* [Test] Trim unnecessary .c and .cpp from config.suffix in lit.local.cfgAdam Nemet2014-05-121-1/+1
| | | | | | | | | | | | | | Tested by comparing make check VERBOSE=1 before and after to make sure no tests are missed. (VERBOSE=1 prints the list of tests.) Only one test :( remains where .cpp is required: tools/llvm-cov/range_based_for.cpp:// RUN: llvm-cov range_based_for.cpp | FileCheck %s --check-prefix=STDOUT The topic was discussed in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html llvm-svn: 208621
* TableGen: use PrintMethods to print more aliasesTim Northover2014-05-1211-108/+105
| | | | llvm-svn: 208607
* ARM64: fix SELECT_CC lowering in absence of NaNs.Tim Northover2014-05-101-2/+15
| | | | | | | | | | We were swapping the true & false results while testing for FMAX/FMIN, but not putting them back to the original state if the later checks failed. Should fix PR19700. llvm-svn: 208469
* [ARM64-BE] Teach fast-isel about how to set up sub-word stack arguments for ↵James Molloy2014-05-081-0/+9
| | | | | | | | big endian calls. SelectionDAG already knows about this, but fast-isel was ignorant. llvm-svn: 208307
* ARM64: make sure FastISel emits SSA MachineInstrsTim Northover2014-05-081-8/+19
| | | | | | We need to use a temporary register for a 2-step operation like REM. llvm-svn: 208297
* AArch64/ARM64: Port NEON post-increment load/store with 2/3/4 vectors to ↵Hao Liu2014-05-081-0/+5077
| | | | | | ARM64 backend. llvm-svn: 208284
* [ARM64][fast-isel] Disable target specific optimizations at -O0. Functionally,Chad Rosier2014-05-072-3/+5
| | | | | | | | | | | | | | | this patch disables the dead register elimination pass and the load/store pair optimization pass at -O0. The ILP optimizations don't require the optimization level to be checked because the call to addILPOpts is predicated with the necessary check. The AdvSIMDScalar pass is disabled by default at all optimization levels. This patch leaves that pass disabled by default. Also, move command-line options into ARM64TargetMachine.cpp and add a few additional flags to aid in debugging. This fixes an issue with the -debug-pass=Structure flag where passes were printed, but not actually run (i.e., AdvSIMDScalar pass). llvm-svn: 208223
* AArch64/ARM64: optimise vector selects & enable testTim Northover2014-05-071-0/+206
| | | | | | | | | When performing a scalar comparison that feeds into a vector select, it's actually better to do the comparison on the vector side: the scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP" since the vector comparisons are all mask based. llvm-svn: 208210
* [ARM64-BE] Fix fast-isel, and add appropriate RUN lines to appropriate tests.James Molloy2014-05-073-1/+4
| | | | llvm-svn: 208200
* [ARM64-BE] Fix variable-argument saving.James Molloy2014-05-071-0/+58
| | | | llvm-svn: 208199
* [ARM64-BE] Implement the lane-twiddling logic at AAPCS boundaries for big ↵James Molloy2014-05-072-0/+1946
| | | | | | | | | | | | | | | | endian. The AAPCS states that values passed in registers must have a value as though they had been loaded with "LDR". LDR is equivalent to "LD1.64 vX.1D" - that is, loading scalars to vector registers and loading 1-element vectors is equivalent. The logic implemented here is to ensure that at all call boundaries and during formal argument lowering all vectors are treated as their bitwidth-based floating point scalar counterpart, which is always one of f64 or f128 (v2i32 -> f64, v4i32 -> f128 etc). A BITCAST is inserted so that the appropriate REV will be generated during code generation. llvm-svn: 208198
* [ARM64-BE] Implement the crazy bitcast handling for big endian vectors.James Molloy2014-05-071-0/+1100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because we've canonicalised on using LD1/ST1, every time we do a bitcast between vector types we must do an equivalent lane reversal. Consider a simple memory load followed by a bitconvert then a store. v0 = load v2i32 v1 = BITCAST v2i32 v0 to v4i16 store v4i16 v2 In big endian mode every memory access has an implicit byte swap. LDR and STR do a 64-bit byte swap, whereas LD1/ST1 do a byte swap per lane - that is, they treat the vector as a sequence of elements to be byte-swapped. The two pairs of instructions are fundamentally incompatible. We've decided to use LD1/ST1 only to simplify compiler implementation. LD1/ST1 perform the equivalent of a sequence of LDR/STR + REV. This makes the original code sequence: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = BITCAST v2i32 v1 to v4i16 v3 = REV v4i16 v2 (implicit) store v4i16 v3 But this is now broken - the value stored is different to the value loaded due to lane reordering. To fix this, on every BITCAST we must perform two other REVs: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = REV v2i32 v3 = BITCAST v2i32 v2 to v4i16 v4 = REV v4i16 v5 = REV v4i16 v4 (implicit) store v4i16 v5 This means an extra two instructions, but actually in most cases the two REV instructions can be combined into one. For example: (REV64_2s (REV64_4h X)) === (REV32_4h X) There is also no 128-bit REV instruction. This must be synthesized with an EXT instruction. Most bitconverts require some sort of conversion. The only exceptions are: a) Identity conversions - vNfX <-> vNiX b) Single-lane-to-scalar - v1fX <-> fX or v1iX <-> iX Even though there are hundreds of changed lines, I have a fairly high confidence that they are somewhat correct. The changes to add two REV instructions per bitcast were pretty mechanical, and once I'd done that I threw the resulting .td at a script I wrote which combined the two REVs together (and added an EXT instruction, for f128) based on an instruction description I gave it. This was much less prone to error than doing it all manually, plus my brain would not just have melted but would have vapourised. llvm-svn: 208194
* [ARM64-BE] Make big endian (scalar) argument passing work correctly.James Molloy2014-05-071-2/+2
| | | | | | | | | | This completes the port of r204814 (cpirker "AArch64_BE function argument passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues found during later development along the way. The biggest of these was that the alignment fixup logic wasn't replicated into all the places it should have been. llvm-svn: 208192
* Implememting named register intrinsicsRenato Golin2014-05-063-0/+51
| | | | | | | | | | | This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104
* [ARM64] Enable alignment control option in front-end for ARM64.Kevin Qin2014-05-061-0/+1
| | | | | | This is the modification in llvm part. llvm-svn: 208074
* Convert a CodeGen test into a MC test.Rafael Espindola2014-05-051-161/+0
| | | | llvm-svn: 207971
* [ARM64] Correctly select ANDWri in FastISel.Joey Gouly2014-05-031-1/+1
| | | | | | http://reviews.llvm.org/D3598 llvm-svn: 207917
* DAGCombine: prevent formation of illegal ConstantFP nodes.Tim Northover2014-05-021-0/+14
| | | | llvm-svn: 207850
* AArch64/ARM64: add patterns for post-indexed ST1 ops.Tim Northover2014-05-021-0/+211
| | | | llvm-svn: 207840
* AArch64/ARM64: support indexed loads/stores on vector types.Tim Northover2014-05-021-0/+402
| | | | | | | | While post-indexed LD1/ST1 instructions do exist for vector loads, this patch makes use of the more flexible addressing-modes in LDR/STR instructions. llvm-svn: 207838
* [ARM64] Prefer generation of bzero on Darwin onlyBradley Smith2014-05-011-5/+12
| | | | llvm-svn: 207760
* AArch64/ARM64: print BFM instructions as BFI or BFXILTim Northover2014-05-012-20/+20
| | | | | | | The canonical form of the BFM instruction is always one of the more explicit extract or insert operations, which makes reading output much easier. llvm-svn: 207752
* [ARM64] Prevent bit extraction to be adjusted by following shiftWeiming Zhao2014-04-301-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx more difficult. For example: Given %shr = lshr i64 %x, 4 %and = and i64 %shr, 15 %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and %0 = load i64* %arrayidx With current shift folding, it takes 3 instrs to compute base address: lsr x8, x0, #1 and x8, x8, #0x78 add x8, x9, x8 If using ubfx, it only needs 2 instrs: ubfx x8, x0, #4, #4 add x8, x9, x8, lsl #3 This fixes bug 19589 llvm-svn: 207702
* ARM64: print fp immediates without using scientific notation.Tim Northover2014-04-303-6/+6
| | | | llvm-svn: 207669
* [ARM64][fast-isel] Fast-isel doesn't know how to handle f128.Chad Rosier2014-04-301-1/+33
| | | | llvm-svn: 207659
* ARM64: print lsr instead of lsrv for variable shifts (etc)Tim Northover2014-04-302-18/+18
| | | | | | | The canonical syntax for shifts by a variable amount does not end with 'v', but that syntax should be supported as an alias (presumably for legacy reasons). llvm-svn: 207649
* AArch64/ARM64: use HS instead of CS & LO instead of CC.Tim Northover2014-04-302-16/+16
| | | | | | | | | On instructions using the NZCV register, a couple of conditions have dual representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and unsigned-lower/carry-clear). The first of these is more descriptive in most circumstances, so we should print it. llvm-svn: 207644
* ARM64: use hex immediates for movz/movk instructionsTim Northover2014-04-3013-112/+112
| | | | | | | | Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to piece together an immediate in 16-bit chunks, hex is probably the most appropriate format. llvm-svn: 207635
* ARM64: hexify printing various immediate operandsTim Northover2014-04-307-17/+17
| | | | | | | | | | This is mostly aimed at the NEON logical operations and MOVI/MVNI (since they accept weird shifts which are more naturally understandable in hex notation). Also changes BRK/HINT etc, which is probably a neutral change, but easier than the alternative. llvm-svn: 207634
OpenPOWER on IntegriCloud