summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
* Remove the use of the subtarget in MCCodeEmitter creation andEric Christopher2015-03-102-4/+0
| | | | | | | update all ports accordingly. Required a couple of small rewrites in handling subtarget features during creation in PPC. llvm-svn: 231861
* Remove the remaining uses of abs64 and nuke it.Benjamin Kramer2015-03-091-3/+3
| | | | | | std::abs works just fine and we're already using it in many places. NFC intended. llvm-svn: 231696
* Make constant arrays that are passed to functions as const.Benjamin Kramer2015-03-072-8/+6
| | | | | | | | In theory this allows the compiler to skip materializing the array on the stack. In practice clang often fails to do that, but that's a different story. NFC. llvm-svn: 231571
* Recommit r231324 with a fix to the ARM execution domain codeEric Christopher2015-03-072-16/+19
| | | | | | | | | | | | to disable lane switching if we don't actually have the instruction set we want to switch to. Models the earlier check above the conditional for the pass. The testcase is one that triggered with the assert that's added as part of the fix, use it to avoid adding a new testcase as it highlights the same problem. llvm-svn: 231539
* [ARM] Enable vector extload combine for legal types.Ahmed Bougacha2015-03-052-0/+24
| | | | | | | | | | | | | | | | | | | | | This commit enables forming vector extloads for ARM. It only does so for legal types, and when we can't fold the extension in a wide/long form of the user instruction. Enabling it for larger types isn't as good an idea on ARM as it is on X86, because: - we pretend that extloads are legal, but end up generating vld+vmov - we have instructions like vld {dN, dM}, which can't be generated when we "manually expand" extloads to vld+vmov. For legal types, the combine doesn't fire that often: in the integration tests only in a big endian testcase, where it removes a pointless AND. Related to rdar://19723053 Differential Revision: http://reviews.llvm.org/D7423 llvm-svn: 231396
* Revert r231324 "Remove the conditional addition of the execution dependency ↵Hans Wennborg2015-03-051-1/+3
| | | | | | | | fixing" See PR22799. llvm-svn: 231348
* Remove the conditional addition of the execution dependency fixingEric Christopher2015-03-051-3/+1
| | | | | | | pass from the ARM backend as the pass itself will detect any use of the appropriate register class. llvm-svn: 231324
* Cleanup and remove a chunk of getARMSubtarget calls in theEric Christopher2015-03-056-23/+33
| | | | | | | ARM TargetMachine pass pipeline construction by pushing them down into the appropriate pass. llvm-svn: 231323
* Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how ↵JF Bastien2015-03-042-3/+7
| | | | | | | | | | | | | | | | | | | | | AtomicRMWInsts are expanded. Summary: In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all. This does not represent a behavioural change and as such no tests were added. Patch by: Richard Diamond. Reviewers: jfb Reviewed By: jfb Subscribers: jfb, aemerson, t.p.northover, llvm-commits Differential Revision: http://reviews.llvm.org/D7713 llvm-svn: 231250
* Remove MCStreamer.h include from MCContext.h and explictly include it where ↵Pete Cooper2015-03-042-0/+2
| | | | | | necessary. NFC llvm-svn: 231193
* Equally to NetBSD, Bitrig/ARM uses the Itanium-ABI.Renato Golin2015-02-271-0/+1
| | | | | | Patch by Patrick Wildt. llvm-svn: 230762
* getRegForInlineAsmConstraint wants to use TargetRegisterInfo forEric Christopher2015-02-262-5/+7
| | | | | | | | | a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699
* Use ".arch_extension" ARM directive to support hwdiv on kraitSumanth Gundapaneni2015-02-261-3/+12
| | | | | | | | | | | In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the features bits are not computed. This patch lets the asm printer emit ".cpu cortex-a9" directive for krait and the hwdiv feature is enabled through ".arch_extension". In short, krait is treated as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since it is not supported bu GNU GAS yet llvm-svn: 230651
* Use ".arch_extension" ARM directive to specify the additional CPU featuresSumanth Gundapaneni2015-02-264-0/+75
| | | | | | | | | | | This patch is in response to r223147 where the avaiable features are computed based on ".cpu" directive. This will work clean for the standard variants like cortex-a9. For custom variants which rely on standard cpu names for assembly, the additional features of a CPU should be propagated. This can be done via ".arch_extension" as long as the assembler supports it. The implementation for krait along with unit test will be submitted in next patch. llvm-svn: 230650
* Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.Eric Christopher2015-02-262-6/+8
| | | | | | | | | This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583
* Improve handling of stack accesses in Thumb-1Renato Golin2015-02-254-12/+57
| | | | | | | | | | | | | | | | | Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR, STR, and ADD only allow offsets that are a multiple of 4. Make some changes to better make use of these instructions: * Use word loads for anyext byte and halfword loads from the stack. * Enforce 4-byte alignment on objects accessed in this way, to ensure that the offset is valid. * Do the same for objects whose frame index is used, in order to avoid having to use more than one ADD to generate the frame index. * Correct how many bits of offset we think AddrModeT1_s has. Patch by John Brawn. llvm-svn: 230496
* Rename UpdateRegAllocHint to match style guidelines.Eric Christopher2015-02-242-2/+2
| | | | llvm-svn: 230357
* ARM: treat [N x i32] and [N x i64] as AAPCS composite typesTim Northover2015-02-243-61/+100
| | | | | | | | | | | The logic is almost there already, with our special homogeneous aggregate handling. Tweaking it like this allows front-ends to emit AAPCS compliant code without ever having to count registers or add discarded padding arguments. Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to apply the logic to all integer arrays for more consistency. llvm-svn: 230348
* Fix handling of negative offsets for AddrModeT2_i8s4 in rewriteT2FrameIndex.Bob Wilson2015-02-241-5/+2
| | | | | | | | | | | | | | This is a follow up to r230233 to fix something that I noticed by inspection. The AddrModeT2_i8s4 addressing mode does not support negative offsets. I spent a good chunk of the day trying to come up with a testcase for this but was not successful. This addressing mode is used to spill and restore GPRPair registers in Thumb2 code and that does not happen often. We also make very limited used of negative offsets when lowering frame indexes. I am going ahead with the change anyway, because I am pretty confident that it is correct. I also added a missing assertion to check that the low bits of the scaled offset are zero. llvm-svn: 230297
* Rewrite the global merge pass to be subprogram agnostic for now.Eric Christopher2015-02-233-11/+6
| | | | | | | | | | | | | It was previously using the subtarget to get values for the global offset without actually checking each function as it was generating code. Go ahead and solidify the current behavior and make the existing FIXMEs more prominent. As a note the ARM backend previously had a thumb1 and non-thumb1 set of defaults. Only the former was tested so I've changed the behavior to only use that for now. llvm-svn: 230245
* Fix incorrect immediate size for AddrModeT2_i8s4 in rewriteT2FrameIndex.Bob Wilson2015-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The natural way to handle this addressing mode would be to say that it has 8 bits and gets scaled by 4, but since the MC layer is expecting the scaling to be already reflected in the immediate value, we have been setting the Scale to 1. That's fine, but then NumBits needs to be adjusted to reflect the effective increase in the range of the immediate. That adjustment was missing. The consequence is that the register scavenger can fail. The estimateRSStackSizeLimit() function in ARMFrameLowering.cpp correctly assumes that the AddrModeT2_i8s4 address mode can handle scaled offsets up to 1020. Under just the right circumstances, we fail to reserve space for the scavenger because it thinks that nothing will be needed. However, the overly pessimistic behavior in rewriteT2FrameIndex causes some frame indexes to be out of range and require scavenged registers, and so the scavenger asserts. Unfortunately I have not been able to come up with a testcase for this. I can only reproduce it on an internal branch where the frame layout and register allocation is slightly different than trunk. We really need a way to serialize MachineInstr-level IR to write reasonable tests for things like this. rdar://problem/19909005 llvm-svn: 230233
* CodeGen: convert CCState interface to using ArrayRefsTim Northover2015-02-212-14/+11
| | | | | | | | | | | Everyone except R600 was manually passing the length of a static array at each callsite, calculated in a variety of interesting ways. Far easier to let ArrayRef handle that. There should be no functional change, but out of tree targets may have to tweak their calls as with these examples. llvm-svn: 230118
* Get the cached subtarget off the MachineFunction rather thanEric Christopher2015-02-208-16/+17
| | | | | | inquiring for a new one from the TargetMachine. llvm-svn: 229999
* Make the TargetMachine::getSubtarget that takes a Function argumentEric Christopher2015-02-201-1/+1
| | | | | | | take a reference to match the getSubtargetImpl that takes a Function argument. llvm-svn: 229994
* [ARM] Re-re-apply VLD1/VST1 base-update combine.Ahmed Bougacha2015-02-192-19/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932
* [ARM] Minor cleanup to CombineBaseUpdate. NFC.Ahmed Bougacha2015-02-191-20/+22
| | | | | | | | | | | In preparation for a future patch: - rename isLoad to isLoadOp: the former is confusing, and can be taken to refer to the fact that the node is an ISD::LOAD. (it isn't, yet.) - change formatting here and there. - add some comments. - const-ify bools. llvm-svn: 229929
* [CodeGen] Use ArrayRef instead of std::vector&. NFC.Ahmed Bougacha2015-02-191-1/+1
| | | | | | The former lets us use SmallVectors. Do so in ARM and AArch64. llvm-svn: 229925
* Reverting r229831 due to multiple ARM/PPC/MIPS build-bot failures.Michael Kuperstein2015-02-199-136/+127
| | | | llvm-svn: 229841
* Use std::bitset for SubtargetFeaturesMichael Kuperstein2015-02-199-127/+136
| | | | | | | | | | | Previously, subtarget features were a bitfield with the underlying type being uint64_t. Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset. No functional change. Differential Revision: http://reviews.llvm.org/D7065 llvm-svn: 229831
* MC: Remove NullStreamer hook, as it is redundant with NullTargetStreamer.Peter Collingbourne2015-02-193-14/+0
| | | | llvm-svn: 229799
* Introduce Target::createNullTargetStreamer and use it from IRObjectFile.Peter Collingbourne2015-02-193-0/+17
| | | | | | | | | A null MCTargetStreamer allows IRObjectFile to ignore target-specific directives. Previously we were crashing. Differential Revision: http://reviews.llvm.org/D7711 llvm-svn: 229797
* [ARM] Add missing M/R class CPUsBradley Smith2015-02-181-0/+16
| | | | | | | | | | | | Add some of the missing M and R class Cortex CPUs, namely: Cortex-M0+ (called Cortex-M0plus for GCC compatibility) Cortex-M1 SC000 SC300 Cortex-R5 llvm-svn: 229660
* Make the ARM AsmPrinter independent of global subtargetEric Christopher2015-02-173-69/+92
| | | | | | | | | | | | | | | | | initialization. Initialize the subtarget once per function and migrate Emit{Start|End}OfAsmFile to either use attributes on the TargetMachine or get information from the subtarget we'd use for assembling. One bit (getISAEncoding) touched the general AsmPrinter and the debug output. Handle this one by passing the function for the subprogram down and updating all callers and users. The top-level-ness of the ARM attribute output for assembly is, by nature, contrary to how we'd want to do this for an LTO situation where we have multiple cpu architectures so this solution is good enough for now. llvm-svn: 229528
* [ARM] Remove unused declaration. NFC.Ahmed Bougacha2015-02-161-1/+0
| | | | | | | GlobalMerge was moved to lib/CodeGen a while ago, and is no longer called "ARMGlobalMerge". llvm-svn: 229448
* ARM: Transfer kill flag when lowering VSTMQIA to VSTMDIA.Matthias Braun2015-02-161-1/+2
| | | | llvm-svn: 229425
* AArch64: Safely handle the incoming sret call argument.Andrew Trick2015-02-161-3/+8
| | | | | | | | | | | | | | | | | | This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering llvm-svn: 229413
* Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for ↵Aaron Ballman2015-02-151-2/+2
| | | | | | requiring the macro. NFC; LLVM edition. llvm-svn: 229340
* ARM: Canonicalize access to function attributes, NFCDuncan P. N. Exon Smith2015-02-147-34/+15
| | | | | | | | | | | | Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229220
* [PM] Remove the old 'PassManager.h' header file at the top level ofChandler Carruth2015-02-131-1/+1
| | | | | | | | | | | | | | | | | | | | LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094
* Re-sort #include lines using my handy dandy ./utils/sort_includes.pyChandler Carruth2015-02-131-1/+1
| | | | | | script. This is in preparation for changes to lots of include lines. llvm-svn: 229088
* MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line ↵Benjamin Kramer2015-02-122-6/+2
| | | | | | | | with countTrailingZeros Update all callers. llvm-svn: 228930
* ARM: Fix another regression introduced in r223113Asiri Rathnayake2015-02-121-5/+0
| | | | | | | | | | | | | | | | | | | | | The changes in r223113 (ARM modified-immediate syntax) have broken instructions like: mov r0, #~0xffffff00 The problem is that I've added a spurious range check on the immediate operand to ensure that it lies between INT32_MIN and UINT32_MAX. While this range check is correct in theory, it causes problems because the operand is stored in an int64_t (by MC). So valid 32-bit constants like \#~0xffffff00 become out of range. The solution is to simply remove this range check. It is not possible to validate the range of the immediate operand with the current setup because: 1) The operand is stored in an int64_t by MC, 2) The immediate can be of the forms #imm, #-imm, #~imm or even #((~imm)) etc. So we just chop the value to 32 bits and use it. Also noted that the original range check was note tested by any of the unit tests. I've added a new test to cover #~imm kind of operands. Change-Id: I411e90d84312a2eff01b732bb238af536c4a7599 llvm-svn: 228920
* ARM & AArch64: teach LowerVSETCC that output type size may differ from input.Tim Northover2015-02-081-13/+16
| | | | | | | | | | | | | | | | | | While various DAG combines try to guarantee that a vector SETCC operation will have the same output size as input, there's nothing intrinsic to either creation or LegalizeTypes that actually guarantees it, so the function needs to be ready to handle a mismatch. Fortunately this is easy enough, just extend or truncate the naturally compared result. I couldn't reproduce the failure in other backends that I know have SIMD, so it's probably only an issue for these two due to shared heritage. Should fix PR21645. llvm-svn: 228518
* Value soft float calls as more expensive in the inliner.Cameron Esfahani2015-02-053-1/+23
| | | | | | | | | | | | | | Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point. Reviewers: chandlerc, echristo Reviewed By: echristo Subscribers: t.p.northover, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D6936 llvm-svn: 228263
* [ARM] Fix subtarget feature set truncation when using .cpu directiveBradley Smith2015-02-041-2/+1
| | | | | | | This is a bug that was caused due to storing the feature bitset in a 32-bit variable when it is a 64-bit mask, discarding the top half of the feature set. llvm-svn: 228151
* Adding support to LLVM for targeting Cortex-A72Renato Golin2015-02-041-0/+4
| | | | | | | | | | Currently, Cortex-A72 is modelled as an Cortex-A57 except the fp load balancing pass isn't enabled for Cortex-A72 as it's not profitable to have it enabled for this core. Patch by Ranjeet Singh. llvm-svn: 228140
* Reverting VLD1/VST1 base-updating/post-incrementing combiningRenato Golin2015-02-041-102/+14
| | | | | | | | | | | This reverts patches 223862, 224198, 224203, and 224754, which were all related to the vector load/store combining and were reverted/reaplied a few times due to the same alignment problems we're seeing now. Further tests, mainly self-hosting Clang, will be needed to reapply this patch in the future. llvm-svn: 228129
* Fix some unnoticed/unwanted behavior change from r222319.Frederic Riss2015-02-041-1/+1
| | | | | | | | | The ARM assembler allows register alias redefinitions as long as it targets the same register. r222319 broke that. In the AArch64 case it would just produce a new warning, but in the ARM case it would error out on previously accepted assembler. llvm-svn: 228109
* Fix ARM peephole optimizeCompare to avoid optimizing unsigned cmp to 0.Jan Wen Voung2015-02-021-11/+23
| | | | | | | | | | | | | | | | | | | Summary: Previously it only avoided optimizing signed comparisons to 0. Sometimes the DAGCombiner will optimize the unsigned comparisons to 0 before it gets to the peephole pass, but sometimes it doesn't. Fix for PR22373. Test Plan: test/CodeGen/ARM/sub-cmp-peephole.ll Reviewers: jfb, manmanren Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D7274 llvm-svn: 227809
* [multiversion] Switch the TTI queries from TargetMachine to SubtargetChandler Carruth2015-02-011-9/+5
| | | | | | | | | | | | | | | | | | | now that we have a correct and cached subtarget specific to the function. Also, finish providing a cached per-function subtarget in the core LLVMTargetMachine -- that layer hadn't switched over yet. The only use of the TargetMachine was to re-lookup a subtarget for a particular function to work around the fact that TTI was immutable. Now that it is per-function and we haved a cached subtarget, use it. This still leaves a few interfaces with real warts on them where we were passing Function objects through the TTI interface. I'll remove these and clean their usage up in subsequent commits now that this isn't necessary. llvm-svn: 227738
OpenPOWER on IntegriCloud