summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* ARM: remove special cases for Darwin dynamic-no-pic mode.Tim Northover2013-11-251-5/+6
| | | | | | | | | These are handled almost identically to static mode (and ELF's global address materialisation), except that a symbol may have "$non_lazy_ptr" appended. This can be handled by passing appropriate flags along with the instruction instead of using entirely separate pseudo-instructions. llvm-svn: 195655
* ARM: produce friendly error for invalid inline asmTim Northover2013-11-141-0/+4
| | | | | | | | | We used to perform an invalid operation on an MVT and crash, which wasn't much fun. Patch by Oliver Stannard. llvm-svn: 194714
* Enable optimization of sin / cos pair into call to __sincos_stret for iOS7+.Bob Wilson2013-11-031-0/+77
| | | | | | | rdar://12856873 Patch by Evan Cheng, with a fix for rdar://13209539 by Tilmann Scheller llvm-svn: 193942
* Legalize: Improve legalization of long vector extends.Jim Grosbach2013-10-311-55/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an extend more than doubles the size of the elements (e.g., a zext from v16i8 to v16i32), the normal legalization method of splitting the vectors will run into problems as by the time the destination vector is legal, the source vector is illegal. The end result is the operation often becoming scalarized, with the typical horrible performance. For example, on x86_64, the simple input of: define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind { %tmp = zext <16 x i8> %a to <16 x i32> store <16 x i32> %tmp, <16 x i32>*%p ret void } Generates: .section __TEXT,__text,regular,pure_instructions .section __TEXT,__const .align 5 LCPI0_0: .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .section __TEXT,__text,regular,pure_instructions .globl _bar .align 4, 0x90 _bar: vpunpckhbw %xmm0, %xmm0, %xmm1 vpunpckhwd %xmm0, %xmm1, %xmm2 vpmovzxwd %xmm1, %xmm1 vinsertf128 $1, %xmm2, %ymm1, %ymm1 vmovaps LCPI0_0(%rip), %ymm2 vandps %ymm2, %ymm1, %ymm1 vpmovzxbw %xmm0, %xmm3 vpunpckhwd %xmm0, %xmm3, %xmm3 vpmovzxbd %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vandps %ymm2, %ymm0, %ymm0 vmovaps %ymm0, (%rdi) vmovaps %ymm1, 32(%rdi) vzeroupper ret So instead we can check if there are legal types that enable us to split more cleverly when the input vector is already legal such that we don't turn it into an illegal type. If the extend is such that it's more than doubling the size of the input we check if - the number of vector elements is even, - the source type is legal, - the type of a split source is illegal, - the type of an extended (by doubling element size) source is legal, and - the type of that extended source when split is legal. If the conditions are met, instead of just splitting both the destination and the source types, we create an extend that only goes up one "step" (doubling the element width), and the continue legalizing the rest of the operation normally. The result is that this operates as a new, more effecient, termination condition for the loop of "split the operation until the destination type is legal." With this change, the above example now compiles to: _bar: vpxor %xmm1, %xmm1, %xmm1 vpunpcklbw %xmm1, %xmm0, %xmm2 vpunpckhwd %xmm1, %xmm2, %xmm3 vpunpcklwd %xmm1, %xmm2, %xmm2 vinsertf128 $1, %xmm3, %ymm2, %ymm2 vpunpckhbw %xmm1, %xmm0, %xmm0 vpunpckhwd %xmm1, %xmm0, %xmm3 vpunpcklwd %xmm1, %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vmovaps %ymm0, 32(%rdi) vmovaps %ymm2, (%rdi) vzeroupper ret This generalizes a custom lowering that was added a while back to the ARM backend. That lowering is no longer necessary, and is removed. The testcases for it, however, provide excellent ARM tests for this change and so remain. rdar://14735100 llvm-svn: 193727
* Struct byval cleanup: add helper functions to reduce code duplication.Manman Ren2013-10-291-180/+117
| | | | | | | | | | Helper functions are added: emitPostLd: emit a post-increment load operation with given size. emitPostSt: emit a post-increment store operation with given size. No functionality change. llvm-svn: 193656
* ARM: don't expand atomicrmw inline on Cortex-M0Tim Northover2013-10-251-9/+10
| | | | | | | | | | There's a barrier instruction so that should still be used, but most actual atomic operations are going to need a platform decision on the correct behaviour (either nop if single-threaded or OS-support otherwise). rdar://problem/15287210 llvm-svn: 193399
* ARM: Tweak usage of '*vfp' compiler_rt functions.Jim Grosbach2013-10-241-1/+2
| | | | | | | | | Only use them if the subtarget has ARM mode, as these routines are implemented as ARM code. rdar://15302004 llvm-svn: 193381
* Remove class abstraction from ARM struct byval loweringDavid Peixotto2013-10-241-553/+262
| | | | | | | | | | | This commit changes the struct byval lowering for arm to use inline checks for the subtarget instead of a class abstraction to represent the differences. The class abstraction was judged to be too much code for this task. No intended functionality change. llvm-svn: 193357
* ARM: Use non-VFP softcalls on embedded Darwinish targetsTim Northover2013-10-241-1/+1
| | | | | | | | | | | | | The compiler-rt functions __adddf3vfp and so on exist purely to allow Thumb1 code to make use of VFP instructions by switching back to ARM mode, they make no sense for M-class processors which don't even have an ARM mode. Given that justification, in practice this is a platform ABI decision so the actual check is based on that rather than CPU features. rdar://problem/15302004 llvm-svn: 193327
* 17309 ARM backend incorrectly lowers COPY_STRUCT_BYVAL_I32 for thumb1 targetsDavid Peixotto2013-10-171-7/+108
| | | | | | | | | | | | | | | | | | | | | | | | This commit implements the correct lowering of the COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets. Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not have the post-increment form of these instructions so the generated assembly contained invalid instructions. Passing the generated assembly to gcc caused it to complain with an error like this: Error: cannot honor width suffix -- `ldrb r3,[r0],#1' and the integrated assembler would generate an object file with an invalid instruction encoding. This commit contains a small test case that demonstrates the problem with thumb1 targets as well as an expanded test case that more throughly tests the lowering of byval struct passing for arm, thumb1, and thumb2 targets. llvm-svn: 192916
* Refactor lowering for COPY_STRUCT_BYVAL_I32David Peixotto2013-10-171-170/+460
| | | | | | | | | | | | | | | | | | This commit refactors the lowering of the COPY_STRUCT_BYVAL_I32 pseudo-instruction in the ARM backend. We introduce a new helper class that encapsulates all of the operations needed during the lowering. The operations are implemented for each subtarget in different subclasses. Currently only arm and thumb2 subtargets are supported. This refactoring was done to easily implement support for thumb1 subtargets. This initial patch does not add support for thumb1, but is only a refactoring. A follow on patch will implement the support for thumb1 subtargets. No intended functionality change. llvm-svn: 192915
* Struct byval: fix a copy-paste error for thumb2.Manman Ren2013-10-151-1/+1
| | | | | | PR17309 llvm-svn: 192730
* Struct byval: use the correct alignment for loads generated to loadManman Ren2013-10-071-1/+2
| | | | | | | | | | | | | | | from struct byval to registers. We used to pass 0 which means the alignment of PtrVT. Even when the alignment of the struct is smaller than 4, the LOADs would have alignment of 4, and further optimizations could combine the LOADs into a ldm, which would cause crash. The fix is to pass the alignment of the struct byval. rdar://problem/15144402 llvm-svn: 192126
* ARM: do not add a regmask for TAILJUMPsMatthias Braun2013-10-041-16/+18
| | | | | | | | | The jump doesn't really kill the registers, the following call does but we never get back anyway. This avoids some verify-machineinstrs problems when TAILJUMPs are if-converted. llvm-svn: 191962
* ARM: support interrupt attributeTim Northover2013-10-011-1/+55
| | | | | | | | | | | This function-attribute modifies the callee-saved register list and function epilogue (specifically the return instruction) so that a routine is suitable for use as an interrupt-handler of the specified type without disrupting user-mode applications. rdar://problem/14207019 llvm-svn: 191766
* [ARM] Use the load-acquire/store-release instructions optimally in AArch32.Amara Emerson2013-09-261-122/+161
| | | | | | Patch by Artyom Skrobov. llvm-svn: 191428
* Fix PR 17368: disable vector mul distribution for square of add/sub for ARMWeiming Zhao2013-09-251-0/+10
| | | | | | | | | | | | | | | | | | | | Generally, it is desirable to distribute (a + b) * c to a*c + b*c for ARM with VMLx forwarding, where a, b and c are vectors. However, for (a + b)*(a + b), distribution will result in one extra instruction. With distribution: x = a + b (add) y = a * x (mul) z = y + b * y (mla) Without distribution: x = a + b (add) z = x * x (mul) This patch checks if a mul is a square of add/sub. If yes, skip distribution. llvm-svn: 191410
* [ARMv8] Change hasV8Fp to hasFPARMv8, and other command line optionsJoey Gouly2013-09-131-2/+2
| | | | | | to be more consistent. llvm-svn: 190692
* [ARMv8] Implement the new DMB/DSB operands.Joey Gouly2013-09-051-2/+2
| | | | | | | This removes the custom ISD Node: MEMBARRIER and replaces it with an intrinsic. llvm-svn: 190055
* Clean up some usage of Triple. The base class has methods for determining ↵Cameron Esfahani2013-08-291-1/+1
| | | | | | if the target is iOS and Linux. llvm-svn: 189604
* ARM: Use "dmb sy" for barriers on M-class CPUsTim Northover2013-08-281-1/+4
| | | | | | | | The usual default of "dmb ish" (inner-shareable) isn't even a valid instruction on v6M or v7M (well, it does the same thing but software is strongly discouraged from using it) so we should emit a full-system barrier there. llvm-svn: 189483
* [ARMv8] Add CodeGen for VMAXNM/VMINNM.Joey Gouly2013-08-231-0/+16
| | | | llvm-svn: 189103
* [ARMv8] Add CodeGen support for VSEL.Joey Gouly2013-08-221-1/+93
| | | | | | | | This uses the ARMcmov pattern that Tim cleaned up in r188995. Thanks to Simon Tatham for his floating point help! llvm-svn: 189024
* [ARM] Constrain some register classes in EmitAtomicBinary64 so thatJoey Gouly2013-08-221-0/+4
| | | | | | we pass these tests with -verify-machineinstrs. llvm-svn: 189006
* ARM: implement some simple f64 materializations.Tim Northover2013-08-201-10/+40
| | | | | | | | Previously we used a const-pool load for virtually all 64-bit floating values. Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov" instructions of one stripe or another. llvm-svn: 188773
* ARM: implement allowTruncateForTailCallTim Northover2013-08-061-0/+15
| | | | | | | Now that it's in place, it seems silly not to let ARM make use of the extra tail call opportunities. llvm-svn: 187795
* [ARM] check bitwidth in PerformORCombineSaleem Abdulrasool2013-07-301-14/+21
| | | | | | | | | | | | | When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the bitwidth of the second operands to both ands match before comparing the negation of the values. Split the check of the value of the second operands to the ands. Move the cast and variable declaration slightly higher to make it slightly easier to follow. Bug-Id: 16700 Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org> llvm-svn: 187404
* [ARM][ISel] Improve the lowering of vector loads.Quentin Colombet2013-07-231-1/+3
| | | | | | | | | | | | | | | | When vectors are built from a single value, the ARM lowering issues a scalar_to_vector node. This node is then always morphed into a move from the general purpose unit to the vector unit. When the value comes from a load, this can be simplified into a vector load to the right lane. This patch changes the lowering of insert_vector_elt to expose a vector friendly pattern in this situation. This is a step toward fixing <rdar://problem/14170854>. llvm-svn: 186999
* ARM: allow printing of ARM atomic DAG nodes.Tim Northover2013-07-161-0/+13
| | | | | | | | We'd forgotten to provide string representations for the special ARMISD atomic nodes; this adds them in. No effect on CodeGen, just makes the output of "-view-whatever-dags" slightly more readable. llvm-svn: 186406
* ARM: implement ldrex, strex and clrex intrinsicsTim Northover2013-07-161-0/+24
| | | | | | | Intrinsics already existed for the 64-bit variants, so these support operations of size at most 32-bits. llvm-svn: 186392
* ARM EABI divmod supportRenato Golin2013-07-161-2/+78
| | | | | | | | | | | | This patch enables calls to __aeabi_idivmod when in EABI mode, by using the remainder value returned on registers (R1), enabled by the ARM triple "none-eabi". Note that Darwin and GNUEABI triples will continue lowering on GNU style, that is, using the stack for the remainder. Still need to add SREM/UREM support fix for 64-bit lowering. llvm-svn: 186390
* Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]).Craig Topper2013-07-151-1/+1
| | | | llvm-svn: 186301
* Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector ↵Craig Topper2013-07-141-5/+5
| | | | | | size. llvm-svn: 186274
* ARM: Improve codegen for generic vselect.Jim Grosbach2013-07-081-0/+18
| | | | | | | | Fall back to by-element insert rather than building it up on the stack. rdar://14351991 llvm-svn: 185846
* Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes.Jakob Stoklund Olesen2013-07-041-2/+0
| | | | | | These exception-related opcodes are not used any longer. llvm-svn: 185625
* Revert r185595-185596 which broke buildbots.Jakob Stoklund Olesen2013-07-041-0/+2
| | | | | | | Revert "Simplify landing pad lowering." Revert "Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes." llvm-svn: 185600
* Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes.Jakob Stoklund Olesen2013-07-031-2/+0
| | | | | | These exception-related opcodes are not used any longer. llvm-svn: 185596
* [ARM] Improve the instruction selection of vector loads.Quentin Colombet2013-07-031-0/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | In the ARM back-end, build_vector nodes are lowered to a target specific build_vector that uses floating point type. This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. In other words, this conversion may introduce artificial dependencies when the code leading to the build vector cannot be completed with a floating point type. In particular, this happens when loads are not aligned. Before this patch, in that case, the compiler generates general purpose loads and creates the floating point vector from them, instead of directly using the vector unit. The patch uses a vector friendly sequence of code when the inserted bitcasts to floating point survived DAGCombine. This is done by a target specific DAGCombine that changes the target specific build_vector into a sequence of insert_vector_elt that get rid of the bitcasts. <rdar://problem/14170854> llvm-svn: 185587
* ARM: relax the atomic release barrier to "dmb ishst" on SwiftTim Northover2013-07-031-1/+11
| | | | | | | | | | | Swift cores implement store barriers that are stronger than the ARM specification but weaker than general barriers. They are, in fact, just about enough to provide the ordering needed for atomic operations with release semantics. This patch makes use of that quirk. llvm-svn: 185527
* Revert r185339 (ARM: relax the atomic release barrier to "dmb ishst")Tim Northover2013-07-011-5/+1
| | | | | | | | | Turns out I'd misread the architecture reference manual and thought that was a load/store-store barrier, when it's not. Thanks for pointing it out Eli! llvm-svn: 185356
* ARM: relax the atomic release barrier to "dmb ishst"Tim Northover2013-07-011-1/+5
| | | | | | | | | | | I believe the full "dmb ish" barrier is not required to guarantee release semantics for atomic operations. The weaker "dmb ishst" prevents previous operations being reordered with a store executed afterwards, which is enough. A key point to note (fortunately already correct) is that this barrier alone is *insufficient* for sequential consistency, no matter how liberally placed. llvm-svn: 185339
* ARM: ensure fixed-point conversions have sane typesTim Northover2013-06-281-5/+36
| | | | | | | | | | | We were generating intrinsics for NEON fixed-point conversions that didn't exist (e.g. float -> i16). There are two cases to consider: + iN is smaller than float. In this case we can do the conversion but need an extend or truncate as well. + iN is larger than float. In this case using the NEON conversion would be incorrect so we don't perform any combining. llvm-svn: 185158
* ARM: Proactively ensure that the LowerCallResult hack for 'this'-returns is ↵Stephen Lin2013-06-261-3/+10
| | | | | | | | not used for incompatible calling conventions. (Currently, ARM 'this'-returns are handled in the standard calling convention case by treating R0 as preserved and doing some extra magic in LowerCallResult; this may not apply to calling conventions added in the future so this patch provides and documents an interface for indicating such) llvm-svn: 185024
* The getRegForInlineAsmConstraint function should only accept MVT value types.Chad Rosier2013-06-221-1/+1
| | | | llvm-svn: 184642
* [ARMTargetLowering] ARMISD::{SUB,ADD}{C,E} second result is a boolean ↵Michael Gottesman2013-06-181-1/+11
| | | | | | implying that upper bits are always 0. llvm-svn: 184231
* Converted an overly aggressive assert to a conditional check in ↵Michael Gottesman2013-06-181-2/+5
| | | | | | | | | | | | | | | | | | AddCombineTo64bitMLAL. Said assert assumes that ADDC will always have a glue node as its second argument and is checked before we even know that we are actually performing the relevant MLAL optimization. This is incorrect since on ARM we *CAN* codegen ADDC with a use list based second argument. Thus to have both effects, I converted the assert to a conditional check which if it fails we do not perform the optimization. In terms of tests I can not produce an ADDC from the IR level until I get in my multiprecision optimization patch which is forthcoming. The tests for said patch would cause this assert to fail implying that said tests will provide the relevant tests. llvm-svn: 184230
* Order CALLSEQ_START and CALLSEQ_END nodes.Andrew Trick2013-05-291-2/+3
| | | | | | | | | | | | Fixes PR16146: gdb.base__call-ar-st.exp fails after pre-RA-sched=source fixes. Patch by Xiaoyi Guo! This also fixes an unsupported dbg.value test case. Codegen was previously incorrect but the test was passing by luck. llvm-svn: 182885
* Track IR ordering of SelectionDAG nodes 2/4.Andrew Trick2013-05-251-115/+115
| | | | | | | Change SelectionDAG::getXXXNode() interfaces as well as call sites of these functions to pass in SDLoc instead of DebugLoc. llvm-svn: 182703
* Replace Count{Leading,Trailing}Zeros_{32,64} with count{Leading,Trailing}Zeros.Michael J. Spencer2013-05-241-7/+7
| | | | llvm-svn: 182680
* ARM: implement @llvm.readcyclecounter intrinsicTim Northover2013-05-231-1/+43
| | | | | | | | | | | | | This implements the @llvm.readcyclecounter intrinsic as the specific MRC instruction specified in the ARM manuals for CPUs with the Power Management extensions. Older CPUs had slightly different methods which may also have to be implemented eventually, but this should cover all v7 cases. rdar://problem/13939186 llvm-svn: 182603
OpenPOWER on IntegriCloud