summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *'Craig Topper2014-11-211-19/+12
| | | | llvm-svn: 222509
* Fix more instances of -Wsentinel on Windows with s/NULL/nullptr/Reid Kleckner2014-11-201-2/+2
| | | | | | Follow up to r221940, where I must not have caught em all. NFC llvm-svn: 222481
* Update SetVector to rely on the underlying set's insert to return a ↵David Blaikie2014-11-191-1/+1
| | | | | | | | | | | | | pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
* We can get the TLOF from the TargetMachine - so constructor no longer ↵Aditya Nandakumar2014-11-131-1/+1
| | | | | | requires TargetLoweringObjectFile to be passed. llvm-svn: 221926
* This patch changes the ownership of TLOF from TargetLoweringBase to ↵Aditya Nandakumar2014-11-131-9/+1
| | | | | | TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878
* [ARM, inline-asm] Fix ARMTargetLowering::getRegForInlineAsmConstraint to returnAkira Hatanaka2014-11-031-0/+2
| | | | | | | | | | | | register class tGPRRegClass if the target is thumb1. This commit fixes a crash that occurs during register allocation which was triggered when a virtual register defined by an inline-asm instruction had to be spilled. rdar://problem/18740489 llvm-svn: 221178
* Renamed CCState members that appear to misspell 'Processed' as 'Proceed'. NFC.Daniel Sanders2014-11-011-3/+3
| | | | | | | | | | | | Reviewers: rnk Reviewed By: rnk Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D5978 llvm-svn: 221061
* [CodeGenPrepare] Move extractelement close to store if they can be combined.Quentin Colombet2014-10-311-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. ** Context ** Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. ** Motivating Example ** Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. ** Proposed Solution ** To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978
* [ARM] Select VMAXNM and VMINNM regardless of operand orderOliver Stannard2014-10-271-6/+12
| | | | | | | | | | | | | | Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671
* Do not emit intermediate register for zero FP immediateRenato Golin2014-10-231-0/+12
| | | | | | | | | | | | | | | | | This updates check for double precision zero floating point constant to allow use of instruction with immediate value rather than temporary register. Currently "a == 0.0", where "a" is of "double" type generates: vmov.i32 d16, #0x0 vcmpe.f64 d0, d16 With this change it becomes: vcmpe.f64 d0, #0 Patch by Sergey Dmitrouk. llvm-svn: 220486
* ARM: rework Thumb1 frame index rewritingTim Northover2014-10-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | The previous code had a few problems, motivating the choices here. 1. It could create instructions clobbering CPSR, but the incoming MachineInstr didn't reflect this. A potential source of corruption. This is why the patch has a new PseudoInst for before lowering. 2. Similarly, there was some code to handle the incoming instruction not being ARMCC::AL, but this would have caused massive problems if it was actually invoked when a complex offset needing more than one instruction was requested. 3. It wasn't designed to handle unaligned pointers (or offsets). These should probably be minimised anyway, but the code needs to deal with them properly regardless. 4. It had some rather dubious ad-hoc code to avoid calling emitThumbRegPlusImmediate, a function which should be designed to do precisely this job. We seem to cover the common cases correctly now, and hopefully can enhance emitThumbRegPlusImmediate to handle any extra optimisations we need to add in future. llvm-svn: 220236
* Use triple's isiOS() and isOSDarwin() methods.Bob Wilson2014-10-091-1/+1
| | | | | | | These methods are already used in lots of places. This makes things more consistent. NFC. llvm-svn: 219386
* constify TargetMachine argument.Eric Christopher2014-10-031-1/+1
| | | | llvm-svn: 218930
* [ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5Oliver Stannard2014-10-011-12/+17
| | | | | | | | | | Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. llvm-svn: 218763
* ARM: Remove unneeded check for MI->hasPostISelHook()Tom Stellard2014-09-251-6/+0
| | | | llvm-svn: 218459
* Add AtomicExpandPass::bracketInstWithFences, and use it whenever ↵Robin Morisset2014-09-231-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getInsertFencesForAtomic would trigger in SelectionDAGBuilder Summary: The goal is to eventually remove all the code related to getInsertFencesForAtomic in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works mostly by accident because the backends are overly conservative), and repeats the same logic that goes in emitLeading/TrailingFence. In this patch, I make AtomicExpandPass insert the fences as it knows better where to put them. Because this requires getting the fences and not just passing an IRBuilder around, I had to change the return type of emitLeading/TrailingFence. This code only triggers on ARM for now. Because it is earlier in the pipeline than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so SelectionDAGBuilder does not add barriers anymore on ARM. If this patch is accepted I plan to implement emitLeading/TrailingFence for all backends that setInsertFencesForAtomic(true), which will allow both making them less conservative and simplifying SelectionDAGBuilder once they are all using this interface. This should not cause any functionnal change so the existing tests are used and not modified. Test Plan: make check-all, benefits from existing tests of atomics on ARM Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5179 llvm-svn: 218329
* Just add a fixme about a possibly faster implementation of some atomic loads ↵Robin Morisset2014-09-231-0/+3
| | | | | | on some ARM processors llvm-svn: 218326
* [ARM] Do not perform a tail call when the caller returns several values.Quentin Colombet2014-09-181-1/+11
| | | | | | | | | | The fix is slightly different then x86 (see r216117) because the number of values attached to a return can vary even for a single returned value (e.g., f64 yields two returned values). <rdar://problem/18352998> llvm-svn: 218076
* Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"Robin Morisset2014-09-181-4/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch was originally in D5304 (I could not find a way to reopen that revision). It was accepted, commited and broke the build bots because the overloading of the constructor of ArrayRef for braced initializer lists is not supported by all toolchains. I then reverted it, and propose this fixed version that uses a plain C array instead in makeDMB (that array is then converted implicitly to an ArrayRef, but that is not behind an ifdef). Could someone confirm me whether initialization lists for plain C arrays are supported by every toolchain used to build llvm ? Otherwise I can just initialize the array in the old way: args[0] = ...; .. ; args[5] = ...; Below is the description of the original patch: ``` I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. ``` Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D5386 llvm-svn: 218066
* Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"Robin Morisset2014-09-171-26/+4
| | | | | | | | | | | It is breaking the build on the buildbots but works fine on my machine, I revert while trying to understand what happens (it appears to depend on the compiler used to build, I probably used a C++11 feature that is not perfectly supported by some of the buildbots). This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc. llvm-svn: 217973
* [ARM, Fix] Fix emitLeading/TrailingFence on old ARM processorsRobin Morisset2014-09-171-4/+26
| | | | | | | | | | | | | | | | | | | | | Summary: I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5304 llvm-svn: 217965
* [X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPassRobin Morisset2014-09-171-0/+2
| | | | | | | | | | | | This required a new hook called hasLoadLinkedStoreConditional to know whether to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to CmpXchg (X86). Apart from that, the new code in AtomicExpandPass is mostly moved from X86AtomicExpandPass. The main result of this patch is to get rid of that pass, which had lots of code duplicated with AtomicExpandPass. llvm-svn: 217928
* Silencing a usually-helpful-but-braindead-silly-in-this-case sign mismatch ↵Aaron Ballman2014-09-041-1/+1
| | | | | | warning with MSVC. NFC. llvm-svn: 217143
* Refactor AtomicExpandPass and add a generic isAtomic() method to InstructionRobin Morisset2014-09-031-17/+23
| | | | | | | | | | | | | | | | | | | | | Summary: Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs. Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving the X86 backend to this pass. This requires a way of easily detecting which instructions are atomic. I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making isAtomic() a method of Instruction implemented by a switch on the opcodes. Test Plan: make check Reviewers: jfb Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D5035 llvm-svn: 217080
* Use target-dependent emitLeading/TrailingFence instead of the ↵Robin Morisset2014-09-031-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | target-independent insertLeading/TrailingFence (in AtomicExpandPass) Fixes two latent bugs: - There was no fence inserted before expanded seq_cst load (unsound on Power) - There was only a fence release before seq_cst stores (again unsound, in particular on Power) It is not even clear if this is correct on ARM swift processors (where release fences are DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift as it is not clear whether it is incorrect. I would love to get documentation stating whether it is correct or not. These two bugs were not triggered because Power is not (yet) using this pass, and these behaviours happen to be (mostly?) working on ARM (although they completely butchered the semantics of the llvm IR). See: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html for an example of the problems that can be caused by the second of these bugs. I couldn't see a way of fixing these in a completely target-independent way without adding lots of unnecessary fences on ARM, hence the target-dependent parts of this patch. This patch implements the new target-dependent parts only for ARM (the default of not doing anything is enough for AArch64), other architectures will use this infrastructure in later patches. llvm-svn: 217076
* Reinstate "Nuke the old JIT."Eric Christopher2014-09-021-0/+1
| | | | | | | | Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982
* Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or ↵Craig Topper2014-08-271-2/+2
| | | | | | just letting them be implicitly created. llvm-svn: 216525
* Use range based for loops to avoid needing to re-mention SmallPtrSet size.Craig Topper2014-08-241-3/+1
| | | | llvm-svn: 216351
* Revert "ARM: improve RTABI 4.2 conformance on Linux"Chad Rosier2014-08-231-29/+38
| | | | | | | This reverts commit r215862 due to nightly failures. Will work on getting a reduced test case, but I wanted to get our bots green in the meantime. llvm-svn: 216325
* Revert "ARM: mark missing functions from RTABI"Chad Rosier2014-08-231-24/+0
| | | | | | This reverts commit r215863. llvm-svn: 216324
* ARM / x86_64 varargs: Don't save regparms in prologue without va_startReid Kleckner2014-08-221-2/+2
| | | | | | | | | | | | There's no need to do this if the user doesn't call va_start. In the future, we're going to have thunks that forward these register parameters with musttail calls, and they won't need these spills for handling va_start. Most of the test suite changes are adding va_start calls to existing tests to keep things working. llvm-svn: 216294
* Add a thread-model knob for lowering atomics on baremetal & single threaded ↵Jonathan Roelofs2014-08-211-2/+6
| | | | | | | | systems http://reviews.llvm.org/D4984 llvm-svn: 216182
* [ARM] Enable DP copy, load and store instructions for FPv4-SPOliver Stannard2014-08-211-21/+158
| | | | | | | | | | | | | | | | | The FPv4-SP floating-point unit is generally referred to as single-precision only, but it does have double-precision registers and load, store and GPR<->DPR move instructions which operate on them. This patch enables the use of these registers, the main advantage of which is that we now comply with the AAPCS-VFP calling convention. This partially reverts r209650, which added some AAPCS-VFP support, but did not handle return values or alignment of double arguments in registers. This patch also adds tests for Thumb2 code generation for floating-point instructions and intrinsics, which previously only existed for ARM. llvm-svn: 216172
* ARM: Fix codegen for rbit intrinsicYi Kong2014-08-201-2/+2
| | | | | | | | | | | | LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic. According to ARM ARM, rbit only takes register as argument, not immediate. The correct instruction should be rbit <Rd>, <Rm>. The bug was originally introduced in r211057. Differential Revision: http://reviews.llvm.org/D4980 llvm-svn: 216064
* Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backendsRobin Morisset2014-08-181-4/+2
| | | | | | | | | | | | | | | | | Summary: Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backends These helper functions are introduced in D4844. Depends D4844 Test Plan: make check-all passes Reviewers: jfb Subscribers: aemerson, llvm-commits, mcrosier, reames Differential Revision: http://reviews.llvm.org/D4937 llvm-svn: 215902
* [ARM,AArch64] Do not tail-call to an externally-defined function with weak ↵Oliver Stannard2014-08-181-0/+13
| | | | | | | | | | | | | linkage Externally-defined functions with weak linkage should not be tail-called on ARM or AArch64, as the AAELF spec requires normal calls to undefined weak functions to be replaced with a NOP or jump to the next instruction. The behaviour of branch instructions in this situation (as used for tail calls) is implementation-defined, so we cannot rely on the linker replacing the tail call with a return. llvm-svn: 215890
* ARM: mark missing functions from RTABISaleem Abdulrasool2014-08-171-0/+24
| | | | | | | Simply indicate the functions that are part of the runtime library that we do not setup libcalls for. This is merely for ease of identification. NFC. llvm-svn: 215863
* ARM: improve RTABI 4.2 conformance on LinuxSaleem Abdulrasool2014-08-171-38/+29
| | | | | | | | | | | | | | | | The set of functions defined in the RTABI was separated for no real reason. This brings us closer to proper utilisation of the functions defined by the RTABI. It also sets the ground for correctly emitting function calls to AEABI functions on all AEABI conforming platforms. The previously existing lie on the behaviour of __ldivmod and __uldivmod is propagated as it is beyond the scope of the change. The changes to the test are due to the fact that we now use the divmod functions which return both the quotient and remainder and thus we no longer need to invoke two functions on Linux (making it closer to EABI's behaviour). llvm-svn: 215862
* ARM: whitespaceSaleem Abdulrasool2014-08-171-5/+5
| | | | | | Whitespace fix, NFC. llvm-svn: 215861
* Fix typos in commentsRobin Morisset2014-08-151-1/+1
| | | | llvm-svn: 215777
* [AArch32] Add support for FP rounding operations for ARMv8/AArch32.Chad Rosier2014-08-151-0/+12
| | | | | | Phabricator Revision: http://reviews.llvm.org/D4935 llvm-svn: 215772
* IR: Print a newline when dumping TypesJustin Bogner2014-08-121-1/+1
| | | | | | | | | | | | Type::dump() doesn't print a newline, which makes for a poor experience in a debugger. This looks like it was an ommission considering Value::dump() two lines above, so I've changed Type to add a newline as well. Of the two in-tree callers, one added a newline anyway, and I've updated the other one to use Type::print instead. llvm-svn: 215421
* ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling ↵Oliver Stannard2014-08-111-0/+13
| | | | | | | | | | | | convention By default, LLVM uses the "C" calling convention for all runtime library functions. The half-precision FP conversion functions use the soft-float calling convention, and are needed for some targets which use the hard-float convention by default, so must have their calling convention explicitly set. llvm-svn: 215348
* Temporarily Revert "Nuke the old JIT." as it's not quite ready toEric Christopher2014-08-071-1/+0
| | | | | | | | | | | be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154
* Nuke the old JIT.Rafael Espindola2014-08-071-0/+1
| | | | | | | | | I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111
* Remove the target machine from CCState. Previously it was only usedEric Christopher2014-08-061-18/+18
| | | | | | | | | to get the subtarget and that's accessible from the MachineFunction now. This helps clear the way for smaller changes where we getting a subtarget will require passing in a MachineFunction/Function as well. llvm-svn: 214988
* ARM: do not generate BLX instructions on Cortex-M CPUs.Tim Northover2014-08-061-2/+2
| | | | | | | | | Particularly on MachO, we were generating "blx _dest" instructions on M-class CPUs, which don't actually exist. They happen to get fixed up by the linker into valid "bl _dest" instructions (which is why such a massive issue has remained largely undetected), but we shouldn't rely on that. llvm-svn: 214959
* ARM-MachO: materialize callee address correctly on v4t.Tim Northover2014-08-061-1/+4
| | | | llvm-svn: 214958
* Re-apply r214881: Fix return sequence on armv4 thumbJonathan Roelofs2014-08-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928
* Revert r214881 because it broke lots of build-botsJonathan Roelofs2014-08-051-4/+0
| | | | llvm-svn: 214893
OpenPOWER on IntegriCloud