summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* ARM: Creating a vector from a lane of another.Jim Grosbach2013-03-021-2/+5
| | | | | | | | | | | The VDUP instruction source register doesn't allow a non-constant lane index, so make sure we don't construct a ARM::VDUPLANE node asking it to do so. rdar://13328063 http://llvm.org/bugs/show_bug.cgi?id=13963 llvm-svn: 176413
* Clean up code format a bit.Jim Grosbach2013-03-021-4/+2
| | | | llvm-svn: 176412
* Tidy up. Trailing whitespace.Jim Grosbach2013-03-021-7/+7
| | | | llvm-svn: 176411
* ARM NEON: Fix v2f32 float intrinsicsArnold Schwaighofer2013-03-021-0/+18
| | | | | | Mark them as expand, they are not legal as our backend does not match them. llvm-svn: 176410
* Add support for using non-pic code for arm and thumb1 when emitting the sjljChad Rosier2013-03-011-10/+21
| | | | | | | | dispatch code. As far as I can tell the thumb2 code is behaving as expected. I was able to compile and run the associated test case for both arm and thumb1. rdar://13066352 llvm-svn: 176363
* Tidy up; no functional change.Chad Rosier2013-02-281-5/+3
| | | | llvm-svn: 176288
* Style; no functional change.Chad Rosier2013-02-281-7/+4
| | | | llvm-svn: 176285
* ARM: FMA is legal only if VFP4 is available.Jim Grosbach2013-02-271-0/+6
| | | | | | rdar://13306723 llvm-svn: 176212
* Remove this instance of dl as it's defined in a previous scope.Chad Rosier2013-02-271-1/+0
| | | | llvm-svn: 176208
* Update TargetLowering ivars for name policy.Jim Grosbach2013-02-201-8/+8
| | | | | | | | | | | http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly ivars should be camel-case and start with an upper-case letter. A few in TargetLowering were starting with a lower-case letter. No functional change intended. llvm-svn: 175667
* Test commit. Fixed typo.David Peixotto2013-02-131-1/+1
| | | | llvm-svn: 175020
* ARM NEON: Handle v16i8 and v8i16 reverse shufflesArnold Schwaighofer2013-02-121-1/+37
| | | | | | | | | | | | | | | Lower reverse shuffles to a vrev64 and a vext instruction instead of the default legalization of storing and loading to the stack. This is important because we generate reverse shuffles in the loop vectorizer when we reverse store to an array. uint8_t Arr[N]; for (i = 0; i < N; ++i) Arr[N - i - 1] = ... radar://13171760 llvm-svn: 174929
* Move MRI liveouts to ARM return instructions.Jakob Stoklund Olesen2013-02-051-13/+11
| | | | llvm-svn: 174406
* Add a special ARM trap encoding for NaCl.Eli Bendersky2013-01-301-2/+10
| | | | | | | | More details in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130128/163783.html Patch by JF Bastien llvm-svn: 173943
* Fix 64-bit atomic operations in Thumb mode.Tim Northover2013-01-291-74/+46
| | | | | | | | The ARM and Thumb variants of LDREXD and STREXD have different constraints and take different operands. Previously the code expanding atomic operations didn't take this into account and asserted in Thumb mode. llvm-svn: 173780
* Teach SDISel to combine fsin / fcos into a fsincos node if the followingEvan Cheng2013-01-291-0/+2
| | | | | | | | | | | | | | | | | | conditions are met: 1. They share the same operand and are in the same BB. 2. Both outputs are used. 3. The target has a native instruction that maps to ISD::FSINCOS node or the target provides a sincos library call. Implemented the generic optimization in sdisel and enabled it for Mac OSX. Also added an additional optimization for x86_64 Mac OSX by using an alternative entry point __sincos_stret which returns the two results in xmm0 / xmm1. rdar://13087969 PR13204 llvm-svn: 173755
* Fixed the condition codes for the atomic64 min/umin code generation on ARM. ↵Silviu Baranga2013-01-251-2/+2
| | | | | | If the sutraction of the higher 32 bit parts gives a 0 result, we need to do the store operation. llvm-svn: 173437
* Switch TargetTransformInfo from an immutable analysis pass that requiresChandler Carruth2013-01-071-32/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a TargetMachine to construct (and thus isn't always available), to an analysis group that supports layered implementations much like AliasAnalysis does. This is a pretty massive change, with a few parts that I was unable to easily separate (sorry), so I'll walk through it. The first step of this conversion was to make TargetTransformInfo an analysis group, and to sink the nonce implementations in ScalarTargetTransformInfo and VectorTargetTranformInfo into a NoTargetTransformInfo pass. This allows other passes to add a hard requirement on TTI, and assume they will always get at least on implementation. The TargetTransformInfo analysis group leverages the delegation chaining trick that AliasAnalysis uses, where the base class for the analysis group delegates to the previous analysis *pass*, allowing all but tho NoFoo analysis passes to only implement the parts of the interfaces they support. It also introduces a new trick where each pass in the group retains a pointer to the top-most pass that has been initialized. This allows passes to implement one API in terms of another API and benefit when some other pass above them in the stack has more precise results for the second API. The second step of this conversion is to create a pass that implements the TargetTransformInfo analysis using the target-independent abstractions in the code generator. This replaces the ScalarTargetTransformImpl and VectorTargetTransformImpl classes in lib/Target with a single pass in lib/CodeGen called BasicTargetTransformInfo. This class actually provides most of the TTI functionality, basing it upon the TargetLowering abstraction and other information in the target independent code generator. The third step of the conversion adds support to all TargetMachines to register custom analysis passes. This allows building those passes with access to TargetLowering or other target-specific classes, and it also allows each target to customize the set of analysis passes desired in the pass manager. The baseline LLVMTargetMachine implements this interface to add the BasicTTI pass to the pass manager, and all of the tools that want to support target-aware TTI passes call this routine on whatever target machine they end up with to add the appropriate passes. The fourth step of the conversion created target-specific TTI analysis passes for the X86 and ARM backends. These passes contain the custom logic that was previously in their extensions of the ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces. I separated them into their own file, as now all of the interface bits are private and they just expose a function to create the pass itself. Then I extended these target machines to set up a custom set of analysis passes, first adding BasicTTI as a fallback, and then adding their customized TTI implementations. The fourth step required logic that was shared between the target independent layer and the specific targets to move to a different interface, as they no longer derive from each other. As a consequence, a helper functions were added to TargetLowering representing the common logic needed both in the target implementation and the codegen implementation of the TTI pass. While technically this is the only change that could have been committed separately, it would have been a nightmare to extract. The final step of the conversion was just to delete all the old boilerplate. This got rid of the ScalarTargetTransformInfo and VectorTargetTransformInfo classes, all of the support in all of the targets for producing instances of them, and all of the support in the tools for manually constructing a pass based around them. Now that TTI is a relatively normal analysis group, two things become straightforward. First, we can sink it into lib/Analysis which is a more natural layer for it to live. Second, clients of this interface can depend on it *always* being available which will simplify their code and behavior. These (and other) simplifications will follow in subsequent commits, this one is clearly big enough. Finally, I'm very aware that much of the comments and documentation needs to be updated. As soon as I had this working, and plausibly well commented, I wanted to get it committed and in front of the build bots. I'll be doing a few passes over documentation later if it sticks. Commits to update DragonEgg and Clang will be made presently. llvm-svn: 171681
* Move all of the header files which are involved in modelling the LLVM IRChandler Carruth2013-01-021-8/+8
| | | | | | | | | | | | | | | | | | | | | into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM. There are still more header files to move here, but I wanted to handle them in separate commits to make tracking what files make sense at each layer easier. The only really questionable files here are the target intrinsic tablegen files. But that's a battle I'd rather not fight today. I've updated both CMake and Makefile build systems (I think, and my tests think, but I may have missed something). I've also re-sorted the includes throughout the project. I'll be committing updates to Clang, DragonEgg, and Polly momentarily. llvm-svn: 171366
* Remove the Function::getFnAttributes method in favor of using the AttributeSetBill Wendling2012-12-301-5/+7
| | | | | | | | | directly. This is in preparation for removing the use of the 'Attribute' class as a collection of attributes. That will shift to the AttributeSet class instead. llvm-svn: 171253
* Revert "Adding support for llvm.arm.neon.vaddl[su].* and"Bob Wilson2012-12-201-18/+0
| | | | | | | This reverts r170694. The operations can be represented in IR without adding any new intrinsics. llvm-svn: 170765
* Adding support for llvm.arm.neon.vaddl[su].* andRenato Golin2012-12-201-0/+18
| | | | | | | | llvm.arm.neon.vsub[su].* intrinsics. Patch by Pete Couperus <pjcoup@gmail.com> llvm-svn: 170694
* Remove the explicit MachineInstrBuilder(MI) constructor.Jakob Stoklund Olesen2012-12-191-1/+1
| | | | | | | | | | | | | Use the version that also takes an MF reference instead. It would technically be possible to extract an MF reference from the MI as MI->getParent()->getParent(), but that would not work for MIs that are not inserted into any basic block. Given the reasonably small number of places this constructor was used at all, I preferred the compile time check to a run time assertion. llvm-svn: 170588
* Change TargetLowering::findRepresentativeClass to take an MVT, insteadPatrik Hagglund2012-12-191-2/+2
| | | | | | of EVT. llvm-svn: 170532
* Rename the 'Attributes' class to 'Attribute'. It's going to represent a ↵Bill Wendling2012-12-191-3/+3
| | | | | | single attribute in the future. llvm-svn: 170502
* Change TargetLowering::getRegClassFor to take an MVT, instead of EVT.Patrik Hagglund2012-12-131-1/+1
| | | | | | | | | | | | Accordingly, add helper funtions getSimpleValueType (in parallel to getValueType) in SDValue, SDNode, and TargetLowering. This is the first, in a series of patches. This is the second attempt. In the first attempt (r169837), a few getSimpleVT() were hoisted too far, detected by bootstrap failures. llvm-svn: 170104
* Sorry about the churn. One more change to getOptimalMemOpType() hook. Did IEvan Cheng2012-12-121-2/+2
| | | | | | | | | | | | mention the inline memcpy / memset expansion code is a mess? This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset. The first indicates whether it is expanding a memset or a memcpy / memmove. The later is whether the memset is a memset of zero. It's totally possible (likely even) that targets may want to do different things for memcpy and memset of zero. llvm-svn: 169959
* - Rename isLegalMemOpType to isSafeMemOpType. "Legal" is a very overloade term.Evan Cheng2012-12-121-6/+2
| | | | | | | | | Also added more comments to explain why it is generally ok to return true. - Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to be true for loaded source (memcpy) or zero constants (memset). The poor name choice is probably some kind of legacy issue. llvm-svn: 169954
* Avoid using lossy load / stores for memcpy / memset expansion. e.g.Evan Cheng2012-12-121-0/+4
| | | | | | f64 load / store on non-SSE2 x86 targets. llvm-svn: 169944
* Replace TargetLowering::isIntImmLegal() withEvan Cheng2012-12-111-18/+33
| | | | | | | | | ScalarTargetTransformInfo::getIntImmCost() instead. "Legal" is a poorly defined term for something like integer immediate materialization. It is always possible to materialize an integer immediate. Whether to use it for memcpy expansion is more a "cost" conceern. llvm-svn: 169929
* Revert EVT->MVT changes, r169836-169851, due to buildbot failures.Patrik Hagglund2012-12-111-3/+3
| | | | llvm-svn: 169854
* Change TargetLowering::findRepresentativeClass to take an MVT, insteadPatrik Hagglund2012-12-111-2/+2
| | | | | | of EVT. llvm-svn: 169845
* Change TargetLowering::getRegClassFor to take an MVT, instead of EVT.Patrik Hagglund2012-12-111-1/+1
| | | | | | | | | Accordingly, add helper funtions getSimpleValueType (in parallel to getValueType) in SDValue, SDNode, and TargetLowering. This is the first, in a series of patches. llvm-svn: 169837
* Stylistic tweak.Evan Cheng2012-12-111-9/+8
| | | | llvm-svn: 169811
* Some enhancements for memcpy / memset inline expansion.Evan Cheng2012-12-101-13/+51
| | | | | | | | | | | | | | | | | | | | | 1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do *not* replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791
* Replace r169459 with something safer. Rather than having computeMaskedBits toEvan Cheng2012-12-061-30/+21
| | | | | | | | | | understand target implementation of any_extend / extload, just generate zero_extend in place of any_extend for liveouts when the target knows the zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz). rdar://12771555 llvm-svn: 169536
* Let targets provide hooks that compute known zero and ones for any_extendEvan Cheng2012-12-061-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and extload's. If they are implemented as zero-extend, or implicitly zero-extend, then this can enable more demanded bits optimizations. e.g. define void @foo(i16* %ptr, i32 %a) nounwind { entry: %tmp1 = icmp ult i32 %a, 100 br i1 %tmp1, label %bb1, label %bb2 bb1: %tmp2 = load i16* %ptr, align 2 br label %bb2 bb2: %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ] %cmp = icmp ult i16 %tmp3, 24 br i1 %cmp, label %bb3, label %exit bb3: call void @bar() nounwind br label %exit exit: ret void } This compiles to the followings before: push {lr} mov r2, #0 cmp r1, #99 bhi LBB0_2 @ BB#1: @ %bb1 ldrh r2, [r0] LBB0_2: @ %bb2 uxth r0, r2 cmp r0, #23 bhi LBB0_4 @ BB#3: @ %bb3 bl _bar LBB0_4: @ %exit pop {lr} bx lr The uxth is not needed since ldrh implicitly zero-extend the high bits. With this change it's eliminated. rdar://12771555 llvm-svn: 169459
* Appease GCC's -Wparentheses.Matt Beaumont-Gay2012-12-041-2/+2
| | | | | | (TIL that Clang's -Wparentheses ignores 'x || y && "foo"' on purpose. Neat.) llvm-svn: 169337
* ARM custom lower ctpop for vector types. Patch by Pete Couperus.Evan Cheng2012-12-041-0/+117
| | | | llvm-svn: 169325
* Use the new script to sort the includes of every file under lib.Chandler Carruth2012-12-031-10/+10
| | | | | | | | | | | | | | | | | Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131
* Codegen failure for vmull with small vectorsSebastian Pop2012-11-301-13/+74
| | | | | | | | | | | | | | | | | | | | | | | | | Codegen was failing with an assertion because of unexpected vector operands when legalizing the selection DAG for a MUL instruction. The asserting code was legalizing multiplies for vectors of size 128 bits. It uses a custom lowering to try and detect cases where it can use a VMULL instruction instead of a VMOVL + VMUL. The code was looking for input operands to the MUL that had been sign or zero extended. If it found the extended operands it would drop the sign/zero extension and use the original vector size as input to a VMULL instruction. The code assumed that the original input vector was 64 bits so that after dropping the extension it would fit directly into a D register and could be used as an operand of a VMULL instruction. The input code that trigger the failure used a vector of <4 x i8> that was sign extended to <4 x i32>. It was not safe to drop the sign extension in this case because the original vector is only 32 bits wide. The fix is to insert a sign extension for the vector to reach the required 64 bit size. In this particular example, the vector would need to be sign extented to a <4 x i16>. llvm-svn: 169024
* Added atomic 64 min/max/umin/umax instrinsics support in the ARM backend.Silviu Baranga2012-11-291-10/+73
| | | | llvm-svn: 168886
* ARM: Implement CanLowerReturn so large vectors get expanded into sret.Benjamin Kramer2012-11-281-0/+11
| | | | | | Fixes 14337. llvm-svn: 168809
* Mark FP_EXTEND form v2f32 to v2f64 as "expand" for ARM NEON. Patch by Pete ↵Eli Friedman2012-11-171-0/+1
| | | | | | Couperus. llvm-svn: 168240
* Remove hard coded registers in ARM ldrexd and strexd instructionsWeiming Zhao2012-11-161-11/+47
| | | | | | | | | This patch replaces the hard coded GPR pair [R0, R1] of Intrinsic:arm_ldrexd and [R2, R3] of Intrinsic:arm_strexd with even/odd GPRPair reg class. Similar to the lowering of atomic_64 operation. llvm-svn: 168207
* Make sure FABS on v2f32 and v4f32 is legal on ARM NEONAnton Korobeynikov2012-11-161-1/+0
| | | | | | This fixes PR14359 llvm-svn: 168200
* Mark FP_ROUND for converting NEON v2f64 to v2f32 as expand. Add a missingEli Friedman2012-11-151-0/+2
| | | | | | | | case to vector legalization so this actually works. Patch by Pete Couperus. Fixes PR12540. llvm-svn: 168107
* Revert changing FNEG of v4f32 to Expand. It's legal.Craig Topper2012-11-151-1/+0
| | | | llvm-svn: 168030
* Make FNEG and FABS of v4f32 Expand.Craig Topper2012-11-151-0/+2
| | | | llvm-svn: 168029
* Add llvm.ceil, llvm.trunc, llvm.rint, llvm.nearbyint intrinsics.Craig Topper2012-11-151-0/+4
| | | | llvm-svn: 168025
OpenPOWER on IntegriCloud