summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Convert SelectionDAG::getVTList to use ArrayRefCraig Topper2014-04-161-2/+2
| | | | llvm-svn: 206357
* Make consistent use of MCPhysReg instead of uint16_t throughout the tree.Craig Topper2014-04-041-2/+2
| | | | llvm-svn: 205610
* Tidy up. Trailing whitespace.Jim Grosbach2014-04-031-2/+2
| | | | llvm-svn: 205583
* ARM: tell LLVM about zext properties of ldrexb/ldrexhTim Northover2014-04-031-0/+14
| | | | | | | | | | | | | Implementing this via ComputeMaskedBits has two advantages: + It actually works. DAGISel doesn't deal with the chains properly in the previous pattern-based solution, so they never trigger. + The information can be used in other DAG combines, as well as the trivial "get rid of truncs". For example if the trunc is in a different basic block. rdar://problem/16227836 llvm-svn: 205540
* ARM: expand atomic ldrex/strex loops in IRTim Northover2014-04-031-747/+3
| | | | | | | | | | | | | | | | | | | | | | | | | The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded at MachineInstr emission time had grown to be extremely large and involved, to account for the subtly different code needed for the various flavours (8/16/32/64 bit, cmpxchg/add/minmax). Moving this transformation into the IR clears up the code substantially, and makes future optimisations much easier: 1. an atomicrmw followed by using the *new* value can be more efficient. As an IR pass, simple CSE could handle this efficiently. 2. Making use of cmpxchg success/failure orderings only has to be done in one (simpler) place. 3. The common "cmpxchg; did we store?" idiom can be exposed to optimisation. I intend to gradually improve this situation within the ARM backend and make sure there are no hidden issues before moving the code out into CodeGen to be shared with (at least ARM64/AArch64, though I think PPC & Mips could benefit too). llvm-svn: 205525
* [ARM] When generating a vpaddl node the input lane type is not always the ↵Silviu Baranga2014-04-031-2/+5
| | | | | | | | | | type of the add operation since extract_vector_elt can perform an extend operation. Get the input lane type from the vector on which we're performing the vpaddl operation on and extend or truncate it to the output type of the original add node. llvm-svn: 205523
* ARM: update subtarget information for Windows on ARMSaleem Abdulrasool2014-04-021-1/+2
| | | | | | | Update the subtarget information for Windows on ARM. This enables using the MC layer to target Windows on ARM. llvm-svn: 205459
* ARM: Add support for segmented stacksOliver Stannard2014-04-021-0/+2
| | | | | | Patch by Alex Crichton, ILyoan, Luqman Aden and Svetoslav. llvm-svn: 205430
* ARM: add intrinsics for the v8 ldaex/stlexTim Northover2014-03-261-0/+4
| | | | | | | | | We've already got versions without the barriers, so this just adds IR-level support for generating the new v8 ones. rdar://problem/16227836 llvm-svn: 204813
* ARM: no need to update SplatBits as it is not usedArnaud A. de Grandmaison2014-03-231-3/+0
| | | | llvm-svn: 204575
* Prune includes in ARM target.Craig Topper2014-03-221-2/+0
| | | | llvm-svn: 204548
* ARM: honour -f{no-,}optimize-sibling-callsSaleem Abdulrasool2014-03-111-2/+4
| | | | | | | | | | | Use the options in the ARMISelLowering to control whether tail calls are optimised or not. Previously, this option was entirely ignored on the ARM target and only honoured on x86. This option is mostly useful in profiling scenarios. The default remains that tail call optimisations will be applied. llvm-svn: 203577
* ARM: remove ancient -arm-tail-calls optionSaleem Abdulrasool2014-03-111-8/+2
| | | | | | | | This option is from 2010, designed to work around a linker issue on Darwin for ARM. According to grosbach this is no longer an issue and this option can safely be removed. llvm-svn: 203576
* ARM: simplify EmitAtomicBinary64Tim Northover2014-03-111-23/+19
| | | | | | | | | ATOMIC_STORE operations always get here as a lowered ATOMIC_SWAP, so there's no need for any code to handle them specially. There should be no functionality change so no tests. llvm-svn: 203567
* IR: add a second ordering operand to cmpxhg for failureTim Northover2014-03-111-4/+4
| | | | | | | | | | | | | | | The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559
* ARM: Correctly align arguments after a byval struct is passed on the stackOliver Stannard2014-03-051-41/+88
| | | | llvm-svn: 202985
* [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.Benjamin Kramer2014-03-021-15/+8
| | | | | | Remove the old functions. llvm-svn: 202636
* Use 16 byte stack alignment for NaCl on ARMMark Seaborn2014-02-161-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NaCl's ARM ABI uses 16 byte stack alignment, so set that in ARMSubtarget.cpp. Using 16 byte alignment exposes an issue in code generation in which a varargs function leaves a 4 byte gap between the values of r1-r3 saved to the stack and the following arguments that were passed on the stack. (Previously, this code only needed to support 4 byte and 8 byte alignment.) With this issue, llc generated: varargs_func: sub sp, sp, #16 push {lr} sub sp, sp, #12 add r0, sp, #16 // Should be 20 stm r0, {r1, r2, r3} ldr r0, .LCPI0_0 // Address of va_list add r1, sp, #16 str r1, [r0] bl external_func Fix the bug by checking for "Align > 4". Also simplify the code by using OffsetToAlignment(), and update comments. Differential Revision: http://llvm-reviews.chandlerc.com/D2677 llvm-svn: 201497
* ARM: use natural LLVM IR for vshll instructionsTim Northover2014-02-101-19/+0
| | | | | | | | Similarly to the vshrn instructions, these are simple zext/sext + trunc operations. Using normal LLVM IR should allow for better code, and more sharing with the AArch64 backend. llvm-svn: 201093
* ARM: use LLVM IR to represent the vshrn operationTim Northover2014-02-101-5/+0
| | | | | | | | | | vshrn is just the combination of a right shift and a truncate (and the limits on the immediate value actually mean the signedness of the shift doesn't matter). Using that representation allows us to get rid of an ARM-specific intrinsic, share more code with AArch64 and hopefully get better code out of the mid-end optimisers. llvm-svn: 201085
* Add address space argument to allowsUnalignedMemoryAccess.Matt Arsenault2014-02-051-3/+4
| | | | | | | On R600, some address spaces have more strict alignment requirements than others. llvm-svn: 200887
* [TLI] Add a new hook to TargetLowering to query the target if a load of a ↵Juergen Ributzka2014-01-281-0/+12
| | | | | | | | | | | | | | | | | constant should be converted to simply the constant itself. Before this patch we used getIntImmCost from TargetTransformInfo to determine if a load of a constant should be converted to just a constant, but the threshold for this was set to an arbitrary value. This value works well for the two targets (X86 and ARM) that implement this target-hook, but it isn't target-independent at all. Now targets have the possibility to decide directly if this optimization should be performed. The default value is set to false to preserve the current behavior. The target hook has been moved to TargetLowering, which removed the last use and need of TargetTransformInfo in SelectionDAG. llvm-svn: 200271
* Fix known typosAlp Toker2014-01-241-2/+2
| | | | | | | Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018
* Switch the NEON register class from QPR to DPair.Jakob Stoklund Olesen2014-01-141-1/+1
| | | | | | | | | | | | | The already allocatable DPair superclass contains odd-even D register pair in addition to the even-odd pairs in the QPR register class. There is no reason to constrain the set of D register pairs that can be used for NEON values. Any NEON instructions that require a Q register will automatically constrain the register class to QPR. The allocation order for DPair begins with the QPR registers, so register allocation is unlikely to change much. llvm-svn: 199186
* ARM MachO: sort out isTargetDarwin/isTargetIOS/... checks.Tim Northover2014-01-061-10/+10
| | | | | | | | | | | | | | | | | | The ARM backend has been using most of the MachO related subtarget checks almost interchangeably, and since the only target it's had to run on has been IOS (which is all three of MachO, Darwin and IOS) it's worked out OK so far. But we'd like to support embedded targets under the "*-*-none-macho" triple, which means everything starts falling apart and inconsistent behaviours emerge. This patch should pick a reasonably sensible set of behaviours for the new triple (and any others that come along, with luck). Some choices were debatable (notably FP == r7 or r11), but we can revisit those later when deficiencies become apparent. llvm-svn: 198617
* Remove unnecessary #includes.Bill Wendling2014-01-061-1/+0
| | | | llvm-svn: 198585
* Refactor function that checks that __builtin_returnaddress's argument is ↵Bill Wendling2014-01-061-4/+1
| | | | | | | | | constant. This moves the check up into the parent class so that all targets can use it without having to copy (and keep in sync) the same error message. llvm-svn: 198579
* Emit an error message if the value passed to __builtin_returnaddress isn't a ↵Bill Wendling2014-01-051-0/+7
| | | | | | | | | | constant __builtin_returnaddress requires that the value passed into is be a constant. However, at -O0 even a constant expression may not be converted to a constant. Emit an error message intead of crashing. llvm-svn: 198531
* [aarch32] fix bug 18268: Incorrect condition of vselWeiming Zhao2013-12-181-1/+1
| | | | | | | | Given vsel_cc, op1, op2, since vsel has no LE/LT, to generate vsel for such selection, it needs to inverse cc and swap op1 and op2. To inverse cc, both L/G and E bits should be flipped. llvm-svn: 197615
* Correct word hyphenationsAlp Toker2013-12-051-1/+1
| | | | | | | This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471
* ARM: decide whether to use movw/movt based on "minsize" attribute.Tim Northover2013-12-021-2/+1
| | | | llvm-svn: 196102
* ARM: add pseudo-instructions for lit-pool global materialisationTim Northover2013-12-021-58/+13
| | | | | | | | | | | | These are used by MachO only at the moment, and (much like the existing MOVW/MOVT set) work around the fact that the labels used in the actual instructions often contain PC-dependent components, which means that repeatedly materialising the same global can't be CSEed. With small modifications, it could be adapted to how ELF finds the address of _GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there. llvm-svn: 196090
* Darwin-ARM: use movw/movt for static relocationsTim Northover2013-11-261-3/+1
| | | | llvm-svn: 195759
* ARM: remove special cases for Darwin dynamic-no-pic mode.Tim Northover2013-11-251-5/+6
| | | | | | | | | These are handled almost identically to static mode (and ELF's global address materialisation), except that a symbol may have "$non_lazy_ptr" appended. This can be handled by passing appropriate flags along with the instruction instead of using entirely separate pseudo-instructions. llvm-svn: 195655
* ARM: produce friendly error for invalid inline asmTim Northover2013-11-141-0/+4
| | | | | | | | | We used to perform an invalid operation on an MVT and crash, which wasn't much fun. Patch by Oliver Stannard. llvm-svn: 194714
* Enable optimization of sin / cos pair into call to __sincos_stret for iOS7+.Bob Wilson2013-11-031-0/+77
| | | | | | | rdar://12856873 Patch by Evan Cheng, with a fix for rdar://13209539 by Tilmann Scheller llvm-svn: 193942
* Legalize: Improve legalization of long vector extends.Jim Grosbach2013-10-311-55/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an extend more than doubles the size of the elements (e.g., a zext from v16i8 to v16i32), the normal legalization method of splitting the vectors will run into problems as by the time the destination vector is legal, the source vector is illegal. The end result is the operation often becoming scalarized, with the typical horrible performance. For example, on x86_64, the simple input of: define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind { %tmp = zext <16 x i8> %a to <16 x i32> store <16 x i32> %tmp, <16 x i32>*%p ret void } Generates: .section __TEXT,__text,regular,pure_instructions .section __TEXT,__const .align 5 LCPI0_0: .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .section __TEXT,__text,regular,pure_instructions .globl _bar .align 4, 0x90 _bar: vpunpckhbw %xmm0, %xmm0, %xmm1 vpunpckhwd %xmm0, %xmm1, %xmm2 vpmovzxwd %xmm1, %xmm1 vinsertf128 $1, %xmm2, %ymm1, %ymm1 vmovaps LCPI0_0(%rip), %ymm2 vandps %ymm2, %ymm1, %ymm1 vpmovzxbw %xmm0, %xmm3 vpunpckhwd %xmm0, %xmm3, %xmm3 vpmovzxbd %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vandps %ymm2, %ymm0, %ymm0 vmovaps %ymm0, (%rdi) vmovaps %ymm1, 32(%rdi) vzeroupper ret So instead we can check if there are legal types that enable us to split more cleverly when the input vector is already legal such that we don't turn it into an illegal type. If the extend is such that it's more than doubling the size of the input we check if - the number of vector elements is even, - the source type is legal, - the type of a split source is illegal, - the type of an extended (by doubling element size) source is legal, and - the type of that extended source when split is legal. If the conditions are met, instead of just splitting both the destination and the source types, we create an extend that only goes up one "step" (doubling the element width), and the continue legalizing the rest of the operation normally. The result is that this operates as a new, more effecient, termination condition for the loop of "split the operation until the destination type is legal." With this change, the above example now compiles to: _bar: vpxor %xmm1, %xmm1, %xmm1 vpunpcklbw %xmm1, %xmm0, %xmm2 vpunpckhwd %xmm1, %xmm2, %xmm3 vpunpcklwd %xmm1, %xmm2, %xmm2 vinsertf128 $1, %xmm3, %ymm2, %ymm2 vpunpckhbw %xmm1, %xmm0, %xmm0 vpunpckhwd %xmm1, %xmm0, %xmm3 vpunpcklwd %xmm1, %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vmovaps %ymm0, 32(%rdi) vmovaps %ymm2, (%rdi) vzeroupper ret This generalizes a custom lowering that was added a while back to the ARM backend. That lowering is no longer necessary, and is removed. The testcases for it, however, provide excellent ARM tests for this change and so remain. rdar://14735100 llvm-svn: 193727
* Struct byval cleanup: add helper functions to reduce code duplication.Manman Ren2013-10-291-180/+117
| | | | | | | | | | Helper functions are added: emitPostLd: emit a post-increment load operation with given size. emitPostSt: emit a post-increment store operation with given size. No functionality change. llvm-svn: 193656
* ARM: don't expand atomicrmw inline on Cortex-M0Tim Northover2013-10-251-9/+10
| | | | | | | | | | There's a barrier instruction so that should still be used, but most actual atomic operations are going to need a platform decision on the correct behaviour (either nop if single-threaded or OS-support otherwise). rdar://problem/15287210 llvm-svn: 193399
* ARM: Tweak usage of '*vfp' compiler_rt functions.Jim Grosbach2013-10-241-1/+2
| | | | | | | | | Only use them if the subtarget has ARM mode, as these routines are implemented as ARM code. rdar://15302004 llvm-svn: 193381
* Remove class abstraction from ARM struct byval loweringDavid Peixotto2013-10-241-553/+262
| | | | | | | | | | | This commit changes the struct byval lowering for arm to use inline checks for the subtarget instead of a class abstraction to represent the differences. The class abstraction was judged to be too much code for this task. No intended functionality change. llvm-svn: 193357
* ARM: Use non-VFP softcalls on embedded Darwinish targetsTim Northover2013-10-241-1/+1
| | | | | | | | | | | | | The compiler-rt functions __adddf3vfp and so on exist purely to allow Thumb1 code to make use of VFP instructions by switching back to ARM mode, they make no sense for M-class processors which don't even have an ARM mode. Given that justification, in practice this is a platform ABI decision so the actual check is based on that rather than CPU features. rdar://problem/15302004 llvm-svn: 193327
* 17309 ARM backend incorrectly lowers COPY_STRUCT_BYVAL_I32 for thumb1 targetsDavid Peixotto2013-10-171-7/+108
| | | | | | | | | | | | | | | | | | | | | | | | This commit implements the correct lowering of the COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets. Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not have the post-increment form of these instructions so the generated assembly contained invalid instructions. Passing the generated assembly to gcc caused it to complain with an error like this: Error: cannot honor width suffix -- `ldrb r3,[r0],#1' and the integrated assembler would generate an object file with an invalid instruction encoding. This commit contains a small test case that demonstrates the problem with thumb1 targets as well as an expanded test case that more throughly tests the lowering of byval struct passing for arm, thumb1, and thumb2 targets. llvm-svn: 192916
* Refactor lowering for COPY_STRUCT_BYVAL_I32David Peixotto2013-10-171-170/+460
| | | | | | | | | | | | | | | | | | This commit refactors the lowering of the COPY_STRUCT_BYVAL_I32 pseudo-instruction in the ARM backend. We introduce a new helper class that encapsulates all of the operations needed during the lowering. The operations are implemented for each subtarget in different subclasses. Currently only arm and thumb2 subtargets are supported. This refactoring was done to easily implement support for thumb1 subtargets. This initial patch does not add support for thumb1, but is only a refactoring. A follow on patch will implement the support for thumb1 subtargets. No intended functionality change. llvm-svn: 192915
* Struct byval: fix a copy-paste error for thumb2.Manman Ren2013-10-151-1/+1
| | | | | | PR17309 llvm-svn: 192730
* Struct byval: use the correct alignment for loads generated to loadManman Ren2013-10-071-1/+2
| | | | | | | | | | | | | | | from struct byval to registers. We used to pass 0 which means the alignment of PtrVT. Even when the alignment of the struct is smaller than 4, the LOADs would have alignment of 4, and further optimizations could combine the LOADs into a ldm, which would cause crash. The fix is to pass the alignment of the struct byval. rdar://problem/15144402 llvm-svn: 192126
* ARM: do not add a regmask for TAILJUMPsMatthias Braun2013-10-041-16/+18
| | | | | | | | | The jump doesn't really kill the registers, the following call does but we never get back anyway. This avoids some verify-machineinstrs problems when TAILJUMPs are if-converted. llvm-svn: 191962
* ARM: support interrupt attributeTim Northover2013-10-011-1/+55
| | | | | | | | | | | This function-attribute modifies the callee-saved register list and function epilogue (specifically the return instruction) so that a routine is suitable for use as an interrupt-handler of the specified type without disrupting user-mode applications. rdar://problem/14207019 llvm-svn: 191766
* [ARM] Use the load-acquire/store-release instructions optimally in AArch32.Amara Emerson2013-09-261-122/+161
| | | | | | Patch by Artyom Skrobov. llvm-svn: 191428
* Fix PR 17368: disable vector mul distribution for square of add/sub for ARMWeiming Zhao2013-09-251-0/+10
| | | | | | | | | | | | | | | | | | | | Generally, it is desirable to distribute (a + b) * c to a*c + b*c for ARM with VMLx forwarding, where a, b and c are vectors. However, for (a + b)*(a + b), distribution will result in one extra instruction. With distribution: x = a + b (add) y = a * x (mul) z = y + b * y (mla) Without distribution: x = a + b (add) z = x * x (mul) This patch checks if a mul is a square of add/sub. If yes, skip distribution. llvm-svn: 191410
OpenPOWER on IntegriCloud