summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
* ARM: correctly calculate the offset of FP in its push.Tim Northover2014-11-141-2/+7
| | | | | | | | | | | When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ended up pointing at an incorrect offset. The .cfi_def_cfa_offset directives are still wrong in this case, but I think that can be improved by refactoring. llvm-svn: 222056
* We can get the TLOF from the TargetMachine - so constructor no longer ↵Aditya Nandakumar2014-11-131-1/+1
| | | | | | requires TargetLoweringObjectFile to be passed. llvm-svn: 221926
* ARM: allow constpool entry to be moved to the user's block in all cases.Tim Northover2014-11-131-1/+7
| | | | | | | | | | | | | | | Normally entries can only move to a lower address, but when that wasn't viable, the user's block was considered anyway. Unfortunately, it went via createNewWater which wasn't designed to handle the case where there's already an island after the block. Unfortunately, the test we have is slow and fragile, and I couldn't reduce it to anything sane even with the @llvm.arm.space intrinsic. The test change here is recreating the previous one after the change. rdar://problem/18545506 llvm-svn: 221905
* ARM: avoid duplicating branches during constant islands.Tim Northover2014-11-131-6/+10
| | | | | | | | | We were using a naive heuristic to determine whether a basic block already had an unconditional branch at the end. This mostly corresponded to reality (assuming branches got optimised) because there's not much point in a branch to the next block, but could go wrong. llvm-svn: 221904
* ARM: add @llvm.arm.space intrinsic for testing ConstantIslands.Tim Northover2014-11-133-0/+10
| | | | | | | | Creating tests for the ConstantIslands pass is very difficult, since it depends on precise layout details. Having the ability to precisely inject a number of bytes into the stream helps greatly. llvm-svn: 221903
* This patch changes the ownership of TLOF from TargetLoweringBase to ↵Aditya Nandakumar2014-11-133-9/+16
| | | | | | TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878
* Pass an ArrayRef to MCDisassembler::getInstruction.Rafael Espindola2014-11-121-12/+7
| | | | | | | | | | | | With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t> instead of a MemoryObject. Even on X86 there is a maximum size an instruction can have. Given that, it seems way simpler and more efficient to just pass an ArrayRef to the disassembler instead of a MemoryObject and have it do a virtual call every time it wants some extra bytes. llvm-svn: 221751
* Add Forward Control-Flow Integrity.Tom Roeder2014-11-112-29/+0
| | | | | | | | | | | | | | | | | | | | This commit adds a new pass that can inject checks before indirect calls to make sure that these calls target known locations. It supports three types of checks and, at compile time, it can take the name of a custom function to call when an indirect call check fails. The default failure function ignores the error and continues. This pass incidentally moves the function JumpInstrTables::transformType from private to public and makes it static (with a new argument that specifies the table type to use); this is so that the CFI code can transform function types at call sites to determine which jump-instruction table to use for the check at that site. Also, this removes support for jumptables in ARM, pending further performance analysis and discussion. Review: http://reviews.llvm.org/D4167 llvm-svn: 221708
* MCAsmParserExtension has a copy of the MCAsmParser. Use it.Rafael Espindola2014-11-111-14/+60
| | | | | | Base classes were storing a second copy. llvm-svn: 221667
* Misc style fixes. NFC.Rafael Espindola2014-11-101-130/+120
| | | | | | | | | | | | | This fixes a few cases of: * Wrong variable name style. * Lines longer than 80 columns. * Repeated names in comments. * clang-format of the above. This make the next patch a lot easier to read. llvm-svn: 221615
* [ARM] Remove more dead code.Tilmann Scheller2014-11-051-4/+0
| | | | | | Dead code identified by the Clang static analyzer. llvm-svn: 221372
* [ARM] Remove another redundant assignment.Tilmann Scheller2014-11-051-1/+0
| | | | | | Found by the Clang static analyzer. llvm-svn: 221368
* [ARM] Remove redundant assignment.Tilmann Scheller2014-11-051-1/+0
| | | | | | Found by the Clang static analyzer. llvm-svn: 221366
* [ARM] Remove dead code identified by the Clang static analyzer.Tilmann Scheller2014-11-051-2/+0
| | | | llvm-svn: 221358
* [ARM] Honor FeatureD16 in the assembler and disassemblerOliver Stannard2014-11-052-1/+12
| | | | | | | | | | | | | | | Some ARM FPUs only have 16 double-precision registers, rather than the normal 32. LLVM represents this with the D16 target feature. This is currently used by CodeGen to avoid using high registers when they are not available, but the assembler and disassembler do not. I fix this in the assmebler and disassembler rather than the InstrInfo.td files, as the latter would require a large number of changes everywhere one of the floating-point instructions is referenced in the backend. This solution is similar to the one used for co-processor numbers and MSR masks. llvm-svn: 221341
* ARM: try to add extra CS-register whenever stack alignment >= 8.Tim Northover2014-11-051-1/+1
| | | | | | | | | | We currently try to push an even number of registers to preserve 8-byte alignment during a function's prologue, but only when the stack alignment is prcisely 8. Many of the reasons for doing it apply also when that alignment > 8 (the extra store is often free, and can save another stack adjustment, though less frequently for 16-byte stack alignment). llvm-svn: 221321
* ARM/Dwarf: correctly align stack before callee-saved VPRsTim Northover2014-11-052-5/+26
| | | | | | | | | | | | | | | | | | We were making an attempt to do this by adding an extra callee-saved GPR (so that there was an even number in the list), but when that failed we went ahead and pushed anyway. This had a couple of potential issues: + The .cfi directives we emit misplaced dN because they were based on PrologEpilogInserter's calculation. + Unaligned stores can be less efficient. + Unaligned stores can actually fault (likely only an issue in niche cases, but possible). This adds a final explicit stack adjustment if all other options fail, so that the actual locations of the registers match up with where they should be. llvm-svn: 221320
* [ARM, inline-asm] Fix ARMTargetLowering::getRegForInlineAsmConstraint to returnAkira Hatanaka2014-11-031-0/+2
| | | | | | | | | | | | register class tGPRRegClass if the target is thumb1. This commit fixes a crash that occurs during register allocation which was triggered when a virtual register defined by an inline-asm instruction had to be spilled. rdar://problem/18740489 llvm-svn: 221178
* Remove the cortex-a9-mp CPU.Charlie Turner2014-11-031-5/+1
| | | | | | | | | | | | | | | | | | This CPU definition is redundant. The Cortex-A9 is defined as supporting multiprocessing extensions. Remove its definition and update appropriate tests. LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only difference between the two CPU definitions in ARM.td is that cortex-a9-mp contains the feature FeatureMP for multiprocessing extensions. This is redundant since the Cortex-A9 is defined as having multiprocessing extensions in the TRMs. armcc also defines the Cortex-A9 as having multiprocessing extensions by default. Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7 llvm-svn: 221166
* Renamed CCState members that appear to misspell 'Processed' as 'Proceed'. NFC.Daniel Sanders2014-11-011-3/+3
| | | | | | | | | | | | Reviewers: rnk Reviewed By: rnk Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D5978 llvm-svn: 221061
* Remove redundant calls to isMaterializable.Rafael Espindola2014-11-011-5/+1
| | | | | | | | | | This removes calls to isMaterializable in the following cases: * It was redundant with a call to isDeclaration now that isDeclaration returns the correct answer for materializable functions. * It was followed by a call to Materialize. Just call Materialize and check EC. llvm-svn: 221050
* Work around bugs in MSVC "14" CTP 3's conversion logicReid Kleckner2014-10-311-1/+1
| | | | | | | | | | It appears to ignore or find ambiguous MachineInstrBuilder's conversion operators that allow conversion to MachineInstr* and MachineBasicBlock::bundle_iterator. As a workaround, add an explicit way to get the MachineInstr. llvm-svn: 221017
* [CodeGenPrepare] Move extractelement close to store if they can be combined.Quentin Colombet2014-10-312-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. ** Context ** Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. ** Motivating Example ** Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. ** Proposed Solution ** To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978
* [ARM] Select VMAXNM and VMINNM regardless of operand orderOliver Stannard2014-10-271-6/+12
| | | | | | | | | | | | | | Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671
* Do not emit intermediate register for zero FP immediateRenato Golin2014-10-231-0/+12
| | | | | | | | | | | | | | | | | This updates check for double precision zero floating point constant to allow use of instruction with immediate value rather than temporary register. Currently "a == 0.0", where "a" is of "double" type generates: vmov.i32 d16, #0x0 vcmpe.f64 d0, d16 With this change it becomes: vcmpe.f64 d0, #0 Patch by Sergey Dmitrouk. llvm-svn: 220486
* [Thumb2] Improve disassembly of memory hintsOliver Stannard2014-10-231-7/+57
| | | | | | | | | Currently, the ARM disassembler will disassemble the Thumb2 memory hint instructions (PLD, PLDW and PLI), even for targets which do not have these instructions. This patch adds the required checks to the disassmebler. llvm-svn: 220472
* [ARM, stack protector] If supported, use armv7 instructions.Akira Hatanaka2014-10-231-4/+39
| | | | | | | | | | | | | | | | | This commit enables using movt/movw to load the stack guard address: movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8)) movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8)) ldr r0, [pc, r0] Previously a pc-relative load was emitted: ldr r0, LCPI0_0 ldr r0, [pc, r0] rdar://problem/18740489 llvm-svn: 220470
* [Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM ↵Jyoti Allur2014-10-221-2/+23
| | | | | | variants in thumb mode llvm-svn: 220379
* [ARM] NEON 32-bit scalar moves are also available in VFPv2Oliver Stannard2014-10-211-2/+3
| | | | | | | | | | | The 32-bit variants of the NEON scalar<->GPR move instructions are also available in VFPv2. The 8- and 16-bit variants do require NEON. Note that the checks in the test file are all -DAG because they are checking a mixture of stdout and stderr, and the ordering is not guaranteed. llvm-svn: 220288
* [Thumb2] LDRS?[BH] cannot load to the PCOliver Stannard2014-10-211-4/+4
| | | | | | | The Thumb2 LDRS?[BH] instructions are not valid when the destination register is the PC (these encodings are used for preload hints). llvm-svn: 220278
* ARM: rework Thumb1 frame index rewritingTim Northover2014-10-205-112/+28
| | | | | | | | | | | | | | | | | | | | | | | The previous code had a few problems, motivating the choices here. 1. It could create instructions clobbering CPSR, but the incoming MachineInstr didn't reflect this. A potential source of corruption. This is why the patch has a new PseudoInst for before lowering. 2. Similarly, there was some code to handle the incoming instruction not being ARMCC::AL, but this would have caused massive problems if it was actually invoked when a complex offset needing more than one instruction was requested. 3. It wasn't designed to handle unaligned pointers (or offsets). These should probably be minimised anyway, but the code needs to deal with them properly regardless. 4. It had some rather dubious ad-hoc code to avoid calling emitThumbRegPlusImmediate, a function which should be designed to do precisely this job. We seem to cover the common cases correctly now, and hopefully can enhance emitThumbRegPlusImmediate to handle any extra optimisations we need to add in future. llvm-svn: 220236
* [Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7MOliver Stannard2014-10-201-3/+5
| | | | | | | These instructions are related to the v7[AR] exception model, and are not defined on v7M. llvm-svn: 220204
* [ARM] Do not select SMULW[BT] or SMLAW[BT]Oliver Stannard2014-10-202-30/+8
| | | | | | | | | | | | | | | | | | | The current instruction selection patterns for SMULW[BT] and SMLAW[BT] are incorrect. These instructions multiply a 32-bit and a 16-bit value (both signed) and return the top 32 bits of the 48-bit result. This preserves the 16 bits of overflow, whereas the patterns they currently match truncate the result to 16 bits then sign extend. To select these instructions, we would need to match an ISD::SMUL_LOHI, a sign extend, two shifts and an or. There is no way to match SMUL_LOHI in an instruction pattern as it defines multiple values, so this would have to be done in C++. I have raised http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct selection of these instructions. This fixes http://llvm.org/bugs/show_bug.cgi?id=19396 llvm-svn: 220196
* [Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndexOliver Stannard2014-10-201-0/+1
| | | | | | | | | This function can, for some offsets from the SP, split one instruction into two. Since it re-uses the original instruction as the first instruction of the result, we need ensure its result register is not marked as dead before we use it in the second instruction. llvm-svn: 220194
* Use triple predicate functions instead of checking values directly. NFC.Bob Wilson2014-10-191-24/+7
| | | | llvm-svn: 220155
* ARM: Fix a bug which was causing convergence failure in constant-island pass.Akira Hatanaka2014-10-171-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bug is in ARMConstantIslands::createNewWater where the upper bound of the new water split point is computed: // This could point off the end of the block if we've already got constant // pool entries following this block; only the last one is in the water list. // Back past any possible branches (allow for a conditional and a maximally // long unconditional). if (BaseInsertOffset + 8 >= UserBBI.postOffset()) { BaseInsertOffset = UserBBI.postOffset() - UPad - 8; DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset)); } The split point is supposed to be somewhere between the machine instruction that loads from the constant pool entry and the end of the basic block, before branch instructions. The code above is fine if the basic block is large enough and there are a sufficient number of instructions following the machine instruction. However, if the machine instruction is near the end of the basic block, BaseInsertOffset can point to the machine instruction or another instruction that precedes it, and this can lead to convergence failure. This commit fixes this bug by ensuring BaseInsertOffset is larger than the offset of the instruction following the constant-loading instruction. rdar://problem/18581150 llvm-svn: 220015
* Simplify handling of --noexecstack by using getNonexecutableStackSection.Rafael Espindola2014-10-153-13/+7
| | | | llvm-svn: 219799
* ARM: drop check for triple that's no longer used.Tim Northover2014-10-151-3/+2
| | | | | | | | | | Early attempts to support AAPCS bare metal MachO targets based the decision on the CPU being compiled for. This was not a particularly great idea and we've got a better option now, but this check remained. No functional change for any target we care about. llvm-svn: 219767
* ARM: remove ARM/Thumb distinction for preferred alignment.Tim Northover2014-10-141-5/+0
| | | | | | | | | | | | Thumb1 has legitimate reasons for preferring 32-bit alignment of types i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be a multiple of 4. However, this is a trade-off betweem code size and RAM usage; the DataLayout string is not the best place to represent it even if desired. So this patch removes the extra Thumb requirements, hopefully making ARM and Thumb completely compatible in this respect. llvm-svn: 219734
* ARM: allow misaligned local variables in Thumb1 mode.Tim Northover2014-10-141-3/+1
| | | | | | | | There's no hard requirement on LLVM to align local variable to 32-bits, so the Thumb1 frame handling needs to be able to deal with variables that are only naturally aligned without falling over. llvm-svn: 219733
* ARM: set preferred aggregate alignment to 32 universally.Tim Northover2014-10-141-4/+3
| | | | | | | | | | | Before, ARM and Thumb mode code had different preferred alignments, which could lead to some rather unexpected results. There's justification for reducing it from the default 64-bits (wasted space), but I don't think there is for going below 32-bits. There's no actual ABI change here, just to reassure people. llvm-svn: 219719
* Grab the subtarget info off of the MachineFunction rather thanEric Christopher2014-10-141-1/+1
| | | | | | indirecting through the TargetMachine. llvm-svn: 219674
* Include map into the A15SDOptimizer rather than pick it upEric Christopher2014-10-141-0/+1
| | | | | | transitively from the DFAPacketizer via TargetInstrInfo.h. llvm-svn: 219652
* Adds support for the Cortex-A17 to the ARM backendRenato Golin2014-10-132-1/+16
| | | | | | Patch by Matthew Wahab. llvm-svn: 219606
* MC: Bit pack MCSymbolData.Benjamin Kramer2014-10-111-1/+1
| | | | | | | | | On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. llvm-svn: 219573
* Remove a compiler bug workaround from 2007. The affected versions of gcc are ↵Benjamin Kramer2014-10-091-11/+1
| | | | | | | | long gone. NFC. llvm-svn: 219433
* Use triple's isiOS() and isOSDarwin() methods.Bob Wilson2014-10-092-3/+2
| | | | | | | These methods are already used in lots of places. This makes things more consistent. NFC. llvm-svn: 219386
* Emit unaligned access build attribute for ARMRenato Golin2014-10-081-0/+7
| | | | | | Patch by Charlie Turner. llvm-svn: 219301
* Refactor isThumb1Only() && isMClass() into a predicate called isV6M()Renato Golin2014-10-082-5/+8
| | | | | | | | | This must be enforced for all v6M cores, not just the cortex-m0, irregardless of the user-specified alignment. Patch by Charlie Turner. llvm-svn: 219300
* Simplify switch statement in ARM subtarget align accessRenato Golin2014-10-081-30/+24
| | | | | | | | This switch can be reduced to a simpler if/else statement. Patch by Charlie Turner. llvm-svn: 219299
OpenPOWER on IntegriCloud