bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[ARM] Remove redundant assignment.	Tilmann Scheller	2014-11-05	1	-1/+0
\| \| \| \| \| \|	Found by the Clang static analyzer. llvm-svn: 221366
*	[ARM] Remove dead code identified by the Clang static analyzer.	Tilmann Scheller	2014-11-05	1	-2/+0
\| \| \| \|	llvm-svn: 221358
*	[ARM] Honor FeatureD16 in the assembler and disassembler	Oliver Stannard	2014-11-05	2	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some ARM FPUs only have 16 double-precision registers, rather than the normal 32. LLVM represents this with the D16 target feature. This is currently used by CodeGen to avoid using high registers when they are not available, but the assembler and disassembler do not. I fix this in the assmebler and disassembler rather than the InstrInfo.td files, as the latter would require a large number of changes everywhere one of the floating-point instructions is referenced in the backend. This solution is similar to the one used for co-processor numbers and MSR masks. llvm-svn: 221341
*	ARM: try to add extra CS-register whenever stack alignment >= 8.	Tim Northover	2014-11-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	We currently try to push an even number of registers to preserve 8-byte alignment during a function's prologue, but only when the stack alignment is prcisely 8. Many of the reasons for doing it apply also when that alignment > 8 (the extra store is often free, and can save another stack adjustment, though less frequently for 16-byte stack alignment). llvm-svn: 221321
*	ARM/Dwarf: correctly align stack before callee-saved VPRs	Tim Northover	2014-11-05	2	-5/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were making an attempt to do this by adding an extra callee-saved GPR (so that there was an even number in the list), but when that failed we went ahead and pushed anyway. This had a couple of potential issues: + The .cfi directives we emit misplaced dN because they were based on PrologEpilogInserter's calculation. + Unaligned stores can be less efficient. + Unaligned stores can actually fault (likely only an issue in niche cases, but possible). This adds a final explicit stack adjustment if all other options fail, so that the actual locations of the registers match up with where they should be. llvm-svn: 221320
*	[ARM, inline-asm] Fix ARMTargetLowering::getRegForInlineAsmConstraint to return	Akira Hatanaka	2014-11-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	register class tGPRRegClass if the target is thumb1. This commit fixes a crash that occurs during register allocation which was triggered when a virtual register defined by an inline-asm instruction had to be spilled. rdar://problem/18740489 llvm-svn: 221178
*	Remove the cortex-a9-mp CPU.	Charlie Turner	2014-11-03	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CPU definition is redundant. The Cortex-A9 is defined as supporting multiprocessing extensions. Remove its definition and update appropriate tests. LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only difference between the two CPU definitions in ARM.td is that cortex-a9-mp contains the feature FeatureMP for multiprocessing extensions. This is redundant since the Cortex-A9 is defined as having multiprocessing extensions in the TRMs. armcc also defines the Cortex-A9 as having multiprocessing extensions by default. Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7 llvm-svn: 221166
*	Renamed CCState members that appear to misspell 'Processed' as 'Proceed'. NFC.	Daniel Sanders	2014-11-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: rnk Reviewed By: rnk Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D5978 llvm-svn: 221061
*	Remove redundant calls to isMaterializable.	Rafael Espindola	2014-11-01	1	-5/+1
\| \| \| \| \| \| \| \| \| \|	This removes calls to isMaterializable in the following cases: * It was redundant with a call to isDeclaration now that isDeclaration returns the correct answer for materializable functions. * It was followed by a call to Materialize. Just call Materialize and check EC. llvm-svn: 221050
*	Work around bugs in MSVC "14" CTP 3's conversion logic	Reid Kleckner	2014-10-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	It appears to ignore or find ambiguous MachineInstrBuilder's conversion operators that allow conversion to MachineInstr* and MachineBasicBlock::bundle_iterator. As a workaround, add an explicit way to get the MachineInstr. llvm-svn: 221017
*	[CodeGenPrepare] Move extractelement close to store if they can be combined.	Quentin Colombet	2014-10-31	2	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. Context Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. Motivating Example Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. Proposed Solution To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978
*	[ARM] Select VMAXNM and VMINNM regardless of operand order	Oliver Stannard	2014-10-27	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671
*	Do not emit intermediate register for zero FP immediate	Renato Golin	2014-10-23	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This updates check for double precision zero floating point constant to allow use of instruction with immediate value rather than temporary register. Currently "a == 0.0", where "a" is of "double" type generates: vmov.i32 d16, #0x0 vcmpe.f64 d0, d16 With this change it becomes: vcmpe.f64 d0, #0 Patch by Sergey Dmitrouk. llvm-svn: 220486
*	[Thumb2] Improve disassembly of memory hints	Oliver Stannard	2014-10-23	1	-7/+57
\| \| \| \| \| \| \| \| \|	Currently, the ARM disassembler will disassemble the Thumb2 memory hint instructions (PLD, PLDW and PLI), even for targets which do not have these instructions. This patch adds the required checks to the disassmebler. llvm-svn: 220472
*	[ARM, stack protector] If supported, use armv7 instructions.	Akira Hatanaka	2014-10-23	1	-4/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit enables using movt/movw to load the stack guard address: movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8)) movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8)) ldr r0, [pc, r0] Previously a pc-relative load was emitted: ldr r0, LCPI0_0 ldr r0, [pc, r0] rdar://problem/18740489 llvm-svn: 220470
*	[Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM ↵	Jyoti Allur	2014-10-22	1	-2/+23
\| \| \| \| \| \|	variants in thumb mode llvm-svn: 220379
*	[ARM] NEON 32-bit scalar moves are also available in VFPv2	Oliver Stannard	2014-10-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	The 32-bit variants of the NEON scalar<->GPR move instructions are also available in VFPv2. The 8- and 16-bit variants do require NEON. Note that the checks in the test file are all -DAG because they are checking a mixture of stdout and stderr, and the ordering is not guaranteed. llvm-svn: 220288
*	[Thumb2] LDRS?[BH] cannot load to the PC	Oliver Stannard	2014-10-21	1	-4/+4
\| \| \| \| \| \| \|	The Thumb2 LDRS?[BH] instructions are not valid when the destination register is the PC (these encodings are used for preload hints). llvm-svn: 220278
*	ARM: rework Thumb1 frame index rewriting	Tim Northover	2014-10-20	5	-112/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous code had a few problems, motivating the choices here. 1. It could create instructions clobbering CPSR, but the incoming MachineInstr didn't reflect this. A potential source of corruption. This is why the patch has a new PseudoInst for before lowering. 2. Similarly, there was some code to handle the incoming instruction not being ARMCC::AL, but this would have caused massive problems if it was actually invoked when a complex offset needing more than one instruction was requested. 3. It wasn't designed to handle unaligned pointers (or offsets). These should probably be minimised anyway, but the code needs to deal with them properly regardless. 4. It had some rather dubious ad-hoc code to avoid calling emitThumbRegPlusImmediate, a function which should be designed to do precisely this job. We seem to cover the common cases correctly now, and hopefully can enhance emitThumbRegPlusImmediate to handle any extra optimisations we need to add in future. llvm-svn: 220236
*	[Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7M	Oliver Stannard	2014-10-20	1	-3/+5
\| \| \| \| \| \| \|	These instructions are related to the v7[AR] exception model, and are not defined on v7M. llvm-svn: 220204
*	[ARM] Do not select SMULW[BT] or SMLAW[BT]	Oliver Stannard	2014-10-20	2	-30/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current instruction selection patterns for SMULW[BT] and SMLAW[BT] are incorrect. These instructions multiply a 32-bit and a 16-bit value (both signed) and return the top 32 bits of the 48-bit result. This preserves the 16 bits of overflow, whereas the patterns they currently match truncate the result to 16 bits then sign extend. To select these instructions, we would need to match an ISD::SMUL_LOHI, a sign extend, two shifts and an or. There is no way to match SMUL_LOHI in an instruction pattern as it defines multiple values, so this would have to be done in C++. I have raised http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct selection of these instructions. This fixes http://llvm.org/bugs/show_bug.cgi?id=19396 llvm-svn: 220196
*	[Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndex	Oliver Stannard	2014-10-20	1	-0/+1
\| \| \| \| \| \| \| \| \|	This function can, for some offsets from the SP, split one instruction into two. Since it re-uses the original instruction as the first instruction of the result, we need ensure its result register is not marked as dead before we use it in the second instruction. llvm-svn: 220194
*	Use triple predicate functions instead of checking values directly. NFC.	Bob Wilson	2014-10-19	1	-24/+7
\| \| \| \|	llvm-svn: 220155
*	ARM: Fix a bug which was causing convergence failure in constant-island pass.	Akira Hatanaka	2014-10-17	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bug is in ARMConstantIslands::createNewWater where the upper bound of the new water split point is computed: // This could point off the end of the block if we've already got constant // pool entries following this block; only the last one is in the water list. // Back past any possible branches (allow for a conditional and a maximally // long unconditional). if (BaseInsertOffset + 8 >= UserBBI.postOffset()) { BaseInsertOffset = UserBBI.postOffset() - UPad - 8; DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset)); } The split point is supposed to be somewhere between the machine instruction that loads from the constant pool entry and the end of the basic block, before branch instructions. The code above is fine if the basic block is large enough and there are a sufficient number of instructions following the machine instruction. However, if the machine instruction is near the end of the basic block, BaseInsertOffset can point to the machine instruction or another instruction that precedes it, and this can lead to convergence failure. This commit fixes this bug by ensuring BaseInsertOffset is larger than the offset of the instruction following the constant-loading instruction. rdar://problem/18581150 llvm-svn: 220015
*	Simplify handling of --noexecstack by using getNonexecutableStackSection.	Rafael Espindola	2014-10-15	3	-13/+7
\| \| \| \|	llvm-svn: 219799
*	ARM: drop check for triple that's no longer used.	Tim Northover	2014-10-15	1	-3/+2
\| \| \| \| \| \| \| \| \| \|	Early attempts to support AAPCS bare metal MachO targets based the decision on the CPU being compiled for. This was not a particularly great idea and we've got a better option now, but this check remained. No functional change for any target we care about. llvm-svn: 219767
*	ARM: remove ARM/Thumb distinction for preferred alignment.	Tim Northover	2014-10-14	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Thumb1 has legitimate reasons for preferring 32-bit alignment of types i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be a multiple of 4. However, this is a trade-off betweem code size and RAM usage; the DataLayout string is not the best place to represent it even if desired. So this patch removes the extra Thumb requirements, hopefully making ARM and Thumb completely compatible in this respect. llvm-svn: 219734
*	ARM: allow misaligned local variables in Thumb1 mode.	Tim Northover	2014-10-14	1	-3/+1
\| \| \| \| \| \| \| \|	There's no hard requirement on LLVM to align local variable to 32-bits, so the Thumb1 frame handling needs to be able to deal with variables that are only naturally aligned without falling over. llvm-svn: 219733
*	ARM: set preferred aggregate alignment to 32 universally.	Tim Northover	2014-10-14	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \|	Before, ARM and Thumb mode code had different preferred alignments, which could lead to some rather unexpected results. There's justification for reducing it from the default 64-bits (wasted space), but I don't think there is for going below 32-bits. There's no actual ABI change here, just to reassure people. llvm-svn: 219719
*	Grab the subtarget info off of the MachineFunction rather than	Eric Christopher	2014-10-14	1	-1/+1
\| \| \| \| \| \|	indirecting through the TargetMachine. llvm-svn: 219674
*	Include map into the A15SDOptimizer rather than pick it up	Eric Christopher	2014-10-14	1	-0/+1
\| \| \| \| \| \|	transitively from the DFAPacketizer via TargetInstrInfo.h. llvm-svn: 219652
*	Adds support for the Cortex-A17 to the ARM backend	Renato Golin	2014-10-13	2	-1/+16
\| \| \| \| \| \|	Patch by Matthew Wahab. llvm-svn: 219606
*	MC: Bit pack MCSymbolData.	Benjamin Kramer	2014-10-11	1	-1/+1
\| \| \| \| \| \| \| \| \|	On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. llvm-svn: 219573
*	Remove a compiler bug workaround from 2007. The affected versions of gcc are ↵	Benjamin Kramer	2014-10-09	1	-11/+1
\| \| \| \| \| \| \| \|	long gone. NFC. llvm-svn: 219433
*	Use triple's isiOS() and isOSDarwin() methods.	Bob Wilson	2014-10-09	2	-3/+2
\| \| \| \| \| \| \|	These methods are already used in lots of places. This makes things more consistent. NFC. llvm-svn: 219386
*	Emit unaligned access build attribute for ARM	Renato Golin	2014-10-08	1	-0/+7
\| \| \| \| \| \|	Patch by Charlie Turner. llvm-svn: 219301
*	Refactor isThumb1Only() && isMClass() into a predicate called isV6M()	Renato Golin	2014-10-08	2	-5/+8
\| \| \| \| \| \| \| \| \|	This must be enforced for all v6M cores, not just the cortex-m0, irregardless of the user-specified alignment. Patch by Charlie Turner. llvm-svn: 219300
*	Simplify switch statement in ARM subtarget align access	Renato Golin	2014-10-08	1	-30/+24
\| \| \| \| \| \| \| \|	This switch can be reduced to a simpler if/else statement. Patch by Charlie Turner. llvm-svn: 219299
*	Cache TargetLowering on SelectionDAGISel and update previous	Eric Christopher	2014-10-08	1	-33/+18
\| \| \| \| \| \|	calls to getTargetLowering() with the cached variable. llvm-svn: 219284
*	ARMInstPrinter.cpp: Suppress a warning for -Asserts. [-Wunused-variable]	NAKAMURA Takumi	2014-10-06	1	-3/+2
\| \| \| \|	llvm-svn: 219172
*	ARM: silence unused variable warning	Tim Northover	2014-10-06	1	-2/+2
\| \| \| \|	llvm-svn: 219128
*	ARM: remove dead InstPrinting code	Tim Northover	2014-10-06	2	-29/+1
\| \| \| \| \| \| \|	This instruction form is handled by different AsmOperands now, so the code is completely dead (and wrong anyway). llvm-svn: 219127
*	Add subtarget caches to aarch64, arm, ppc, and x86.	Eric Christopher	2014-10-06	2	-2/+44
\| \| \| \| \| \| \| \| \|	These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106
*	Eliminate some deep std::vector copies. NFC.	Benjamin Kramer	2014-10-03	1	-3/+1
\| \| \| \|	llvm-svn: 218999
*	Revert 202433 - Provide a target override for the latest regalloc heuristic	Renato Golin	2014-10-03	2	-7/+0
\| \| \| \| \| \| \| \| \| \| \|	That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981
*	constify TargetMachine argument.	Eric Christopher	2014-10-03	4	-4/+4
\| \| \| \|	llvm-svn: 218930
*	We can grab the options struct from the TargetMachine, no need to	Eric Christopher	2014-10-03	3	-5/+4
\| \| \| \| \| \|	pass it down in the constructor. llvm-svn: 218929
*	ARM: allow copying of CPSR when all else fails.	Tim Northover	2014-10-01	3	-1/+57
\| \| \| \| \| \| \| \| \| \| \| \|	As with x86 and AArch64, certain situations can arise where we need to spill CPSR in the middle of a calculation. These should be avoided where possible (MRS/MSR is rather expensive), which ARM is actually better at than the other two since it tries to Glue defs to uses, but as a last ditch effort, copying is better than crashing. rdar://problem/18011155 llvm-svn: 218789
*	[ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5	Oliver Stannard	2014-10-01	1	-12/+17
\| \| \| \| \| \| \| \| \| \|	Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. llvm-svn: 218763
*	[ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)	Oliver Stannard	2014-10-01	5	-1/+20
\| \| \| \| \| \| \| \| \|	The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be modelled using the same target feature, and all double-precision operations are already disabled by the fp-only-sp target features. llvm-svn: 218747