summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert r238473, "Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM."Peter Collingbourne2015-06-051-52/+0
| | | | | | | as it caused miscompilations and assertion failures (PR23768, http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150601/280380.html). llvm-svn: 239169
* Re-commit of r238201 with fix for building with shared libraries.Luke Cheeseman2015-06-011-1/+47
| | | | llvm-svn: 238739
* Add address space argument to isLegalAddressingModeMatt Arsenault2015-06-011-1/+2
| | | | | | | | | | This is important because of different addressing modes depending on the address space for GPU targets. This only adds the argument, and does not update any of the uses to provide the correct address space. llvm-svn: 238723
* Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM.Peter Collingbourne2015-05-281-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were previously codegen'ing these as regular load/store operations and hoping that the register allocator would allocate registers in ascending order so that we could apply an LDM/STM combine after register allocation. According to the commit that first introduced this code (r37179), we planned to teach the register allocator to allocate the registers in ascending order. This never got implemented, and up to now we've been stuck with very poor codegen. A much simpler approach for achiveing better codegen is to create LDM/STM instructions with identical sets of virtual registers, let the register allocator pick arbitrary registers and order register lists when printing an MCInst. This approach also avoids the need to repeatedly calculate offsets which ultimately ought to be eliminated pre-RA in order to decrease register pressure. This is implemented by lowering the memcpy intrinsic to a series of SD-only MCOPY pseudo-instructions which performs a memory copy using a given number of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a little unusual, but it avoids the need to encode register lists in the SD, and we can take advantage of SD use lists to decide whether to use the _UPD variant of the instructions. Fixes PR9199. Differential Revision: http://reviews.llvm.org/D9508 llvm-svn: 238473
* Revert "Re-commit changes in r237579 with fix for bug breaking windows builds."Diego Novillo2015-05-261-47/+1
| | | | | | | This reverts commit r238201 to fix linking problems in x86 Linux http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150525/278413.html llvm-svn: 238223
* Re-commit changes in r237579 with fix for bug breaking windows builds.Luke Cheeseman2015-05-261-1/+47
| | | | llvm-svn: 238201
* Test CommitLuke Cheeseman2015-05-261-1/+0
| | | | llvm-svn: 238199
* ARM: Fix comment and make it slightly more readableMatthias Braun2015-05-201-7/+7
| | | | llvm-svn: 237820
* Simplify IRBuilder::CreateCall* by using ArrayRef+initializer_list/braced ↵David Blaikie2015-05-181-5/+5
| | | | | | init only llvm-svn: 237624
* Revert r237579, as it broke windows buildbotsOliver Stannard2015-05-181-45/+2
| | | | llvm-svn: 237583
* [LLVM - ARM/AArch64] Add ACLE special register intrinsicsOliver Stannard2015-05-181-2/+45
| | | | | | | | | | | | | | | | | | | This patch implements LLVM support for the ACLE special register intrinsics in section 10.1, __arm_{w,r}sr{,p,64}. This patch is intended to lower the read/write_register instrinsics, used to implement the special register intrinsics in the clang patch for special register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor registers in AArch32 and AArch64. This is done by inspecting the register string passed to the intrinsic and then lowering to the appropriate instruction. Patch by Luke Cheeseman. Differential Revision: http://reviews.llvm.org/D9699 llvm-svn: 237579
* Re-apply r237247 - [AArch64] Codegen VMAX/VMIN for safe math casesArtyom Skrobov2015-05-141-3/+0
| | | | | | No longer breaks SPEC2000/2006 llvm-svn: 237361
* ARM: remove custom jump table UIDTim Northover2015-05-131-22/+11
| | | | | | | | We were creating and propagating two separate indices for each jump table (from back in the mists of time). However, the generic index used by other backends is sufficient to emit a unique symbol so this was unneeded. llvm-svn: 237294
* Revert r237247 - [AArch64] Codegen VMAX/VMIN.. as it is causing failures in ↵Silviu Baranga2015-05-131-0/+3
| | | | | | SPEC2000/2006 llvm-svn: 237256
* [AArch64] Codegen VMAX/VMIN for safe math casesArtyom Skrobov2015-05-131-3/+0
| | | | llvm-svn: 237247
* Migrate existing backends that care about software floating pointEric Christopher2015-05-121-6/+10
| | | | | | | | | | | | | | | | | | | | to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079
* ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not ↵Arnold Schwaighofer2015-05-081-1/+3
| | | | | | | | | | | | | | | | | | | | non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916
* Fix typo.Matthias Braun2015-05-071-1/+1
| | | | llvm-svn: 236785
* Change getTargetNodeName() to produce compiler warnings for missing cases, ↵Matthias Braun2015-05-071-2/+6
| | | | | | fix them llvm-svn: 236775
* [ARM] generate VMAXNM/VMINNM for a compare followed by a select, in safe ↵Artyom Skrobov2015-05-061-25/+100
| | | | | | math mode too llvm-svn: 236590
* Don't always apply kill flag in thumb2 ABS pseudo expansion.Pete Cooper2015-04-301-1/+2
| | | | | | | | | The expansion for t2ABS was always setting the kill flag on the rsb instruction. It should instead only be set on rsb if it was set on the original ABS instruction. rdar://problem/20752113 llvm-svn: 236272
* Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"Sergey Dmitrouk2015-04-281-223/+251
| | | | | | | | | | | | | | | | | | | | | | | | | [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989
* Revert "[DebugInfo] Add debug locations to constant SD nodes"Daniel Jasper2015-04-281-251/+223
| | | | | | | This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987
* [DebugInfo] Add debug locations to constant SD nodesSergey Dmitrouk2015-04-281-223/+251
| | | | | | | | | | | | | | | | | | | | | | | This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
* Cleanup, remove unused return valueMatthias Braun2015-04-281-4/+2
| | | | llvm-svn: 235952
* [ARM] make vminnm/vmaxnm work with ?le, ?ge and no-nans-fp-mathScott Douglass2015-04-081-9/+18
| | | | | | | | | | Because -menable-no-nans causes fcmp conditions to be rewritten without 'o' or 'u' the recognition code in needs to cope. Also extended it to handle 'le' and 'ge. Differential Revision: http://reviews.llvm.org/D8725 llvm-svn: 234421
* Use movw/movt instead of constant pool loads to lower byval parameter copiesDerek Schuff2015-03-261-5/+9
| | | | | | | | | | | | | | Summary: The ARM backend can use a loop to implement copying byval parameters before a call. In non-thumb2 mode it uses a constant pool load to materialize the trip count. For targets that need movt instead (e.g. Native Client), use the same code as in thumb2 mode to materialize the trip count. Reviewers: jfb, t.p.northover Differential Revision: http://reviews.llvm.org/D8442 llvm-svn: 233324
* Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.Benjamin Kramer2015-03-231-1/+2
| | | | llvm-svn: 232998
* [ARM] Remove target-specific ITOFP/FPTOI nodesJames Molloy2015-03-231-44/+8
| | | | | | | | Anton tried this 5 years ago but it was reverted due to extra VMOVs being emitted. This can be easily fixed with a liberal application of patterns - matching loads/stores and extractelts. llvm-svn: 232958
* [ARM] Align stack objects passed to memory intrinsicsJohn Brawn2015-03-181-0/+15
| | | | | | | | | | | | Memcpy, and other memory intrinsics, typically tries to use LDM/STM if the source and target addresses are 4-byte aligned. In CodeGenPrepare look for calls to memory intrinsics and, if the object is on the stack, 4-byte align it if it's large enough that we expect that memcpy would want to use LDM/STM to copy it. Differential Revision: http://reviews.llvm.org/D7908 llvm-svn: 232627
* In preparation for moving ARM's TargetRegisterInfo to the TargetMachineEric Christopher2015-03-121-1/+1
| | | | | | | merge Thumb1RegisterInfo and Thumb2RegisterInfo. This will enable us to match the TargetMachine for our TargetRegisterInfo classes. llvm-svn: 232117
* Silencing an "enumeral and non-enumeral type in conditional expression" ↵Aaron Ballman2015-03-121-1/+1
| | | | | | warning; NFC. llvm-svn: 232035
* Have getCallPreservedMask and getThisCallPreservedMask take aEric Christopher2015-03-111-3/+3
| | | | | | | MachineFunction argument so that we can grab subtarget specific features off of it. llvm-svn: 231979
* ARM: simplify and extend byval handlingTim Northover2015-03-111-216/+103
| | | | | | | | | | | | | | | | | | | The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, leading to FrameLowering setting the frame pointer incorrectly and clobbering the stack. But byval handling had grown over many years, and had multiple layers of cruft trying to compensate for each other and calculate padding correctly. This only really needs to be done once, in the HandleByVal function. Elsewhere should just do what it's told by that call. I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits byvals with the correct C ABI alignment), which simplified HandleByVal. rdar://20095672 llvm-svn: 231959
* Remove the remaining uses of abs64 and nuke it.Benjamin Kramer2015-03-091-3/+3
| | | | | | std::abs works just fine and we're already using it in many places. NFC intended. llvm-svn: 231696
* Make constant arrays that are passed to functions as const.Benjamin Kramer2015-03-071-7/+5
| | | | | | | | In theory this allows the compiler to skip materializing the array on the stack. In practice clang often fails to do that, but that's a different story. NFC. llvm-svn: 231571
* [ARM] Enable vector extload combine for legal types.Ahmed Bougacha2015-03-051-0/+22
| | | | | | | | | | | | | | | | | | | | | This commit enables forming vector extloads for ARM. It only does so for legal types, and when we can't fold the extension in a wide/long form of the user instruction. Enabling it for larger types isn't as good an idea on ARM as it is on X86, because: - we pretend that extloads are legal, but end up generating vld+vmov - we have instructions like vld {dN, dM}, which can't be generated when we "manually expand" extloads to vld+vmov. For legal types, the combine doesn't fire that often: in the integration tests only in a big endian testcase, where it removes a pointless AND. Related to rdar://19723053 Differential Revision: http://reviews.llvm.org/D7423 llvm-svn: 231396
* Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how ↵JF Bastien2015-03-041-2/+5
| | | | | | | | | | | | | | | | | | | | | AtomicRMWInsts are expanded. Summary: In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all. This does not represent a behavioural change and as such no tests were added. Patch by: Richard Diamond. Reviewers: jfb Reviewed By: jfb Subscribers: jfb, aemerson, t.p.northover, llvm-commits Differential Revision: http://reviews.llvm.org/D7713 llvm-svn: 231250
* Remove MCStreamer.h include from MCContext.h and explictly include it where ↵Pete Cooper2015-03-041-0/+1
| | | | | | necessary. NFC llvm-svn: 231193
* getRegForInlineAsmConstraint wants to use TargetRegisterInfo forEric Christopher2015-02-261-2/+3
| | | | | | | | | a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699
* Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.Eric Christopher2015-02-261-4/+5
| | | | | | | | | This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583
* ARM: treat [N x i32] and [N x i64] as AAPCS composite typesTim Northover2015-02-241-4/+8
| | | | | | | | | | | The logic is almost there already, with our special homogeneous aggregate handling. Tweaking it like this allows front-ends to emit AAPCS compliant code without ever having to count registers or add discarded padding arguments. Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to apply the logic to all integer arrays for more consistency. llvm-svn: 230348
* Rewrite the global merge pass to be subprogram agnostic for now.Eric Christopher2015-02-231-6/+0
| | | | | | | | | | | | | It was previously using the subtarget to get values for the global offset without actually checking each function as it was generating code. Go ahead and solidify the current behavior and make the existing FIXMEs more prominent. As a note the ARM backend previously had a thumb1 and non-thumb1 set of defaults. Only the former was tested so I've changed the behavior to only use that for now. llvm-svn: 230245
* CodeGen: convert CCState interface to using ArrayRefsTim Northover2015-02-211-9/+6
| | | | | | | | | | | Everyone except R600 was manually passing the length of a static array at each callsite, calculated in a variety of interesting ways. Far easier to let ArrayRef handle that. There should be no functional change, but out of tree targets may have to tweak their calls as with these examples. llvm-svn: 230118
* [ARM] Re-re-apply VLD1/VST1 base-update combine.Ahmed Bougacha2015-02-191-14/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932
* [ARM] Minor cleanup to CombineBaseUpdate. NFC.Ahmed Bougacha2015-02-191-20/+22
| | | | | | | | | | | In preparation for a future patch: - rename isLoad to isLoadOp: the former is confusing, and can be taken to refer to the fact that the node is an ISD::LOAD. (it isn't, yet.) - change formatting here and there. - add some comments. - const-ify bools. llvm-svn: 229929
* [CodeGen] Use ArrayRef instead of std::vector&. NFC.Ahmed Bougacha2015-02-191-1/+1
| | | | | | The former lets us use SmallVectors. Do so in ARM and AArch64. llvm-svn: 229925
* AArch64: Safely handle the incoming sret call argument.Andrew Trick2015-02-161-3/+8
| | | | | | | | | | | | | | | | | | This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering llvm-svn: 229413
* ARM: Canonicalize access to function attributes, NFCDuncan P. N. Exon Smith2015-02-141-9/+4
| | | | | | | | | | | | Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229220
* MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line ↵Benjamin Kramer2015-02-121-5/+1
| | | | | | | | with countTrailingZeros Update all callers. llvm-svn: 228930
OpenPOWER on IntegriCloud