bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fixed the condition codes for the atomic64 min/umin code generation on ARM. ↵	Silviu Baranga	2013-01-25	1	-2/+2
\| \| \| \| \| \|	If the sutraction of the higher 32 bit parts gives a 0 result, we need to do the store operation. llvm-svn: 173437
*	Remove some register allocation order dependencies.	Jakob Stoklund Olesen	2013-01-19	3	-11/+11
\| \| \| \|	llvm-svn: 172874
*	Simplify writing floating types to assembly.	Tim Northover	2013-01-11	1	-10/+0
\| \| \| \| \| \| \|	This removes previous special cases for each floating-point type in favour of a shared codepath. llvm-svn: 172189
*	Stack Alignment: throw error if we can't satisfy the minimal alignment	Manman Ren	2013-01-10	2	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	requirement when creating stack objects in MachineFrameInfo. Add CreateStackObjectWithMinAlign to throw error when the minimal alignment can't be achieved and to clamp the alignment when the preferred alignment can't be achieved. Same is true for CreateVariableSizedObject. Will not emit error in CreateSpillStackObject or CreateStackObject. As long as callers of CreateStackObject do not assume the object will be aligned at the requested alignment, we should not have miscompile since later optimizations which look at the object's alignment will have the correct information. rdar://12713765 llvm-svn: 172027
*	MIsched: add an ILP window property to machine model.	Andrew Trick	2013-01-09	1	-48/+0
\| \| \| \| \| \| \| \| \| \|	This was an experimental option, but needs to be defined per-target. e.g. PPC A2 needs to aggressively hide latency. I converted some in-order scheduling tests to A2. Hal is working on more test cases. llvm-svn: 171946
*	Specify complete triple for fp128 tests.	Tim Northover	2013-01-08	1	-1/+1
\| \| \| \| \| \| \| \|	This avoids FileCheck failing over different comment characters in assembly (notably powerpc64 on Linux vs Darwin) and should fix David's build-bot. llvm-svn: 171886
*	Allow the asm printer to print fp128 values properly.	Tim Northover	2013-01-08	1	-0/+10
\| \| \| \|	llvm-svn: 171866
*	Make the MergeGlobals pass correctly handle the address space qualifiers of ↵	Silviu Baranga	2013-01-07	1	-0/+12
\| \| \| \| \| \|	the global variables. We partition the set of globals by their address space, and apply the same the trasnformation as before to merge them. llvm-svn: 171730
*	Revert "Adding support for llvm.arm.neon.vaddl[su].* and"	Bob Wilson	2012-12-20	2	-128/+0
\| \| \| \| \| \| \|	This reverts r170694. The operations can be represented in IR without adding any new intrinsics. llvm-svn: 170765
*	Adding support for llvm.arm.neon.vaddl[su].* and	Renato Golin	2012-12-20	2	-0/+128
\| \| \| \| \| \| \| \|	llvm.arm.neon.vsub[su].* intrinsics. Patch by Pete Couperus <pjcoup@gmail.com> llvm-svn: 170694
*	LLVM sdisel normalize bit extraction of the form:	Evan Cheng	2012-12-19	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	((x & 0xff00) >> 8) << 2 to (x >> 6) & 0x3fc This is general goodness since it folds a left shift into the mask. However, the trailing zeros in the mask prevents the ARM backend from using the bit extraction instructions. And worse since the mask materialization may require an addition instruction. This comes up fairly frequently when the result of the bit twiddling is used as memory address. e.g. = ptr[(x & 0xFF0000) >> 16] We want to generate: ubfx r3, r1, #16, #8 ldr.w r3, [r0, r3, lsl #2] vs. mov.w r9, #1020 and.w r2, r9, r1, lsr #14 ldr r2, [r0, r2] Add a late ARM specific isel optimization to ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the 'base + offset' address computation; change the mask to one which doesn't have trailing zeros and enable the use of ubfx. Note the optimization has to be done late since it's target specific and we don't want to change the DAG normalization. It's also fairly restrictive as shifter operands are not always free. It's only done for lsh 1 / 2. It's known to be free on some cpus and they are most common for address computation. This is a slight win for blowfish, rijndael, etc. rdar://12870177 llvm-svn: 170581
*	Disable ARM partial flag dependency optimization at -Oz	Quentin Colombet	2012-12-18	1	-0/+34
\| \| \| \| \| \| \| \|	To not over constrain the scheduler for ARM in thumb mode, some optimizations for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency). Disables this check when code size is the priority, i.e., code is compiled with -Oz. llvm-svn: 170462
*	MISched: add dependence to ExitSU to model live-out latency.	Andrew Trick	2012-12-18	1	-0/+48
\| \| \| \|	llvm-svn: 170454
*	Some enhancements for memcpy / memset inline expansion.	Evan Cheng	2012-12-10	4	-30/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do not replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791
*	Added Mapping Symbols for ARM ELF	Tim Northover	2012-12-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Before this patch, when you objdump an LLVM-compiled file, objdump tried to decode data-in-code sections as if they were code. This patch adds the missing Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D). Patch based on work by Greg Fitzgerald. llvm-svn: 169609
*	Fix typos in CHECK lines.	Dmitri Gribenko	2012-12-06	1	-1/+1
\| \| \| \| \| \|	Patch by Alexander Zinenko. llvm-svn: 169547
*	Properly fix the tes.	Evan Cheng	2012-12-06	1	-2/+1
\| \| \| \|	llvm-svn: 169464
*	llvm/test/CodeGen/ARM/extload-knownzero.ll: Try to unbreak, to add -O0. I ↵	NAKAMURA Takumi	2012-12-06	1	-1/+1
\| \| \| \| \| \|	guess Chad expects fastisel here. llvm-svn: 169463
*	[arm fast-isel] Make the fast-isel implementation of memcpy respect alignment.	Chad Rosier	2012-12-06	1	-3/+94
\| \| \| \| \| \|	rdar://12821569 llvm-svn: 169460
*	Let targets provide hooks that compute known zero and ones for any_extend	Evan Cheng	2012-12-06	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and extload's. If they are implemented as zero-extend, or implicitly zero-extend, then this can enable more demanded bits optimizations. e.g. define void @foo(i16* %ptr, i32 %a) nounwind { entry: %tmp1 = icmp ult i32 %a, 100 br i1 %tmp1, label %bb1, label %bb2 bb1: %tmp2 = load i16* %ptr, align 2 br label %bb2 bb2: %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ] %cmp = icmp ult i16 %tmp3, 24 br i1 %cmp, label %bb3, label %exit bb3: call void @bar() nounwind br label %exit exit: ret void } This compiles to the followings before: push {lr} mov r2, #0 cmp r1, #99 bhi LBB0_2 @ BB#1: @ %bb1 ldrh r2, [r0] LBB0_2: @ %bb2 uxth r0, r2 cmp r0, #23 bhi LBB0_4 @ BB#3: @ %bb3 bl _bar LBB0_4: @ %exit pop {lr} bx lr The uxth is not needed since ldrh implicitly zero-extend the high bits. With this change it's eliminated. rdar://12771555 llvm-svn: 169459
*	ARM custom lower ctpop for vector types. Patch by Pete Couperus.	Evan Cheng	2012-12-04	1	-0/+191
\| \| \| \|	llvm-svn: 169325
*	Use the 'count' attribute to calculate the upper bound of an array.	Bill Wendling	2012-12-04	2	-2/+2
\| \| \| \| \| \| \| \| \|	The count attribute is more accurate with regards to the size of an array. It also obviates the upper bound attribute in the subrange. We can also better handle an unbound array by setting the count to -1 instead of the lower bound to 1 and upper bound to 0. llvm-svn: 169312
*	Add a 'count' field to the DWARF subrange.	Bill Wendling	2012-12-04	2	-2/+2
\| \| \| \| \| \| \| \| \|	The count field is necessary because there isn't a difference between the 'lo' and 'hi' attributes for a one-element array and a zero-element array. When the count is '0', we know that this is a zero-element array. When it's >=1, then it's a normal constant sized array. When it's -1, then the array is unbounded. llvm-svn: 169218
*	Stack Alignment: when creating stack objects in MachineFrameInfo, make sure	Manman Ren	2012-12-04	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the alignment is clamped to TargetFrameLowering.getStackAlignment if the target does not support stack realignment or the option "realign-stack" is off. This will cause miscompile if the address is treated as aligned and add is replaced with or in DAGCombine. Added a bool StackRealignable to TargetFrameLowering to check whether stack realignment is implemented for the target. Also added a bool RealignOption to MachineFrameInfo to check whether the option "realign-stack" is on. rdar://12713765 llvm-svn: 169197
*	Simplify REG_SEQUENCE lowering.	Jakob Stoklund Olesen	2012-12-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	The TwoAddressInstructionPass takes the machine code out of SSA form by expanding REG_SEQUENCE instructions into copies. It is no longer necessary to rewrite the registers used by a REG_SEQUENCE instruction because the new coalescer algorithm can do it now. REG_SEQUENCE is just converted to a sequence of sub-register copies now. llvm-svn: 169067
*	Codegen failure for vmull with small vectors	Sebastian Pop	2012-11-30	1	-0/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Codegen was failing with an assertion because of unexpected vector operands when legalizing the selection DAG for a MUL instruction. The asserting code was legalizing multiplies for vectors of size 128 bits. It uses a custom lowering to try and detect cases where it can use a VMULL instruction instead of a VMOVL + VMUL. The code was looking for input operands to the MUL that had been sign or zero extended. If it found the extended operands it would drop the sign/zero extension and use the original vector size as input to a VMULL instruction. The code assumed that the original input vector was 64 bits so that after dropping the extension it would fit directly into a D register and could be used as an operand of a VMULL instruction. The input code that trigger the failure used a vector of <4 x i8> that was sign extended to <4 x i32>. It was not safe to drop the sign extension in this case because the original vector is only 32 bits wide. The fix is to insert a sign extension for the vector to reach the required 64 bit size. In this particular example, the vector would need to be sign extented to a <4 x i16>. llvm-svn: 169024
*	Handle the situation where CodeGenPrepare removes a reference to a BB that has	Bill Wendling	2012-11-29	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \|	the last invoke instruction in the function. This also removes the last landing pad in an function. This is fine, but with SjLj EH code, we've already placed a bunch of code in the 'entry' block, which expects the landing pad to stick around. When we get to the situation where CGP has removed the last landing pad, go ahead and nuke the SjLj instructions from the 'entry' block. <rdar://problem/12721258> llvm-svn: 168930
*	Added atomic 64 min/max/umin/umax instrinsics support in the ARM backend.	Silviu Baranga	2012-11-29	1	-0/+61
\| \| \| \|	llvm-svn: 168886
*	Avoid rewriting instructions twice.	Jakob Stoklund Olesen	2012-11-29	1	-0/+41
\| \| \| \| \| \| \| \| \|	This could cause miscompilations in targets where sub-register composition is not always idempotent (ARM). <rdar://problem/12758887> llvm-svn: 168837
*	ARM: Implement CanLowerReturn so large vectors get expanded into sret.	Benjamin Kramer	2012-11-28	1	-0/+12
\| \| \| \| \| \|	Fixes 14337. llvm-svn: 168809
*	Add -verify-machineinstrs to these fast-isel test cases.	Chad Rosier	2012-11-27	4	-6/+6
\| \| \| \|	llvm-svn: 168723
*	CSE: allow PerformTrivialCoalescing to check copies across basic block	Manman Ren	2012-11-27	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	boundaries. Given the following case: BB0 %vreg1<def> = SUBrr %vreg0, %vreg7 %vreg2<def> = COPY %vreg7 BB1 %vreg10<def> = SUBrr %vreg0, %vreg2 We should be able to CSE between SUBrr in BB0 and SUBrr in BB1. rdar://12462006 llvm-svn: 168717
*	Never use .lcomm on platforms where it does not accept an alignment	Ulrich Weigand	2012-11-27	2	-3/+5
\| \| \| \| \| \| \| \| \| \| \|	argument. Instead, use a pair of .local and .comm directives. This avoids spurious differences between binaries built by the integrated assembler vs. those built by the external assembler, since the external assembler may impose alignment requirements on .lcomm symbols where the integrated assembler does not. llvm-svn: 168704
*	Extend test case for r168657.	Chad Rosier	2012-11-27	1	-0/+30
\| \| \| \|	llvm-svn: 168658
*	Fix physical register liveness calculations:	Tim Northover	2012-11-20	1	-0/+20
\| \| \| \| \| \| \|	+ Take account of clobbers + Give outputs priority over inputs since they happen later. llvm-svn: 168360
*	Factor out type info emission into separate routine.	Anton Korobeynikov	2012-11-19	1	-0/+77
\| \| \| \| \| \| \|	It turned out that ARM wants different layout of type infos. This is yet another patch in attempt to fix PR7187 llvm-svn: 168325
*	Mark FP_EXTEND form v2f32 to v2f64 as "expand" for ARM NEON. Patch by Pete ↵	Eli Friedman	2012-11-17	1	-0/+8
\| \| \| \| \| \|	Couperus. llvm-svn: 168240
*	[fast-isel] Add the -verify-machineinstrs to these test cases. The remaining	Chad Rosier	2012-11-17	8	-15/+15
\| \| \| \| \| \| \|	test cases require fixes to fast-isel before the verifier can be enabled. Part of rdar://12594152 llvm-svn: 168233
*	Remove hard coded registers in ARM ldrexd and strexd instructions	Weiming Zhao	2012-11-16	1	-41/+41
\| \| \| \| \| \| \| \| \|	This patch replaces the hard coded GPR pair [R0, R1] of Intrinsic:arm_ldrexd and [R2, R3] of Intrinsic:arm_strexd with even/odd GPRPair reg class. Similar to the lowering of atomic_64 operation. llvm-svn: 168207
*	Make sure FABS on v2f32 and v4f32 is legal on ARM NEON	Anton Korobeynikov	2012-11-16	1	-0/+17
\| \| \| \| \| \|	This fixes PR14359 llvm-svn: 168200
*	Mark FP_ROUND for converting NEON v2f64 to v2f32 as expand. Add a missing	Eli Friedman	2012-11-15	1	-0/+9
\| \| \| \| \| \| \| \|	case to vector legalization so this actually works. Patch by Pete Couperus. Fixes PR12540. llvm-svn: 168107
*	The code pattern "imm0_255_neg" is used for checking if an immediate value ↵	Nadav Rotem	2012-11-14	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \|	is a small negative number. This patch changes the definition of negative from -0..-255 to -1..-255. I am changing this because of a bug that we had in some of the patterns that assumed that "subs" of zero does not set the carry flag. rdar://12028498 llvm-svn: 167963
*	Fix really stupid ARM EHABI info generation bug: we should not emit	Anton Korobeynikov	2012-11-14	1	-0/+18
\| \| \| \| \| \| \|	eh table and handler data if there are no landing pads in the function. Patch by Logan Chien with some cleanups from me. llvm-svn: 167945
*	Use TARGET2 relocation for TType references on ARM.	Anton Korobeynikov	2012-11-14	1	-0/+44
\| \| \| \| \| \| \| \|	Do some cleanup of the code while here. Inspired by patch by Logan Chien! llvm-svn: 167904
*	Cleanup the main RegisterCoalescer loop.	Andrew Trick	2012-11-13	1	-1/+1
\| \| \| \| \| \|	Block priorities still apply outside loops. llvm-svn: 167793
*	misched: Infrastructure for weak DAG edges.	Andrew Trick	2012-11-12	1	-0/+2
\| \| \| \| \| \| \| \|	This adds support for weak DAG edges to the general scheduling infrastructure in preparation for MachineScheduler support for heuristics based on weak edges. llvm-svn: 167738
*	Disable the Thumb no-return call optimization:	Evan Cheng	2012-11-10	2	-15/+0
\| \| \| \| \| \| \| \| \| \| \| \|	mov lr, pc b.w _foo The "mov" instruction doesn't set bit zero to one, it's putting incorrect value in lr. It messes up backtraces. rdar://12663632 llvm-svn: 167657
*	Recommit modified r167540.	Amara Emerson	2012-11-08	1	-2/+2
\| \| \| \| \| \| \|	Improve ARM build attribute emission for architectures types. This also changes the default architecture emitted for a generic CPU to "v7". llvm-svn: 167574
*	Vext Lowering was missing opportunities	Quentin Colombet	2012-11-02	1	-0/+33
\| \| \| \|	llvm-svn: 167318
*	Change ForceSizeOpt attribute into MinSize attribute	Quentin Colombet	2012-10-30	1	-2/+2
\| \| \| \|	llvm-svn: 167020