bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Stylistic tweak.	Evan Cheng	2012-12-11	1	-9/+8
\| \| \| \|	llvm-svn: 169811
*	Some enhancements for memcpy / memset inline expansion.	Evan Cheng	2012-12-10	1	-13/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do not replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791
*	Replace r169459 with something safer. Rather than having computeMaskedBits to	Evan Cheng	2012-12-06	1	-30/+21
\| \| \| \| \| \| \| \| \| \|	understand target implementation of any_extend / extload, just generate zero_extend in place of any_extend for liveouts when the target knows the zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz). rdar://12771555 llvm-svn: 169536
*	Let targets provide hooks that compute known zero and ones for any_extend	Evan Cheng	2012-12-06	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and extload's. If they are implemented as zero-extend, or implicitly zero-extend, then this can enable more demanded bits optimizations. e.g. define void @foo(i16* %ptr, i32 %a) nounwind { entry: %tmp1 = icmp ult i32 %a, 100 br i1 %tmp1, label %bb1, label %bb2 bb1: %tmp2 = load i16* %ptr, align 2 br label %bb2 bb2: %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ] %cmp = icmp ult i16 %tmp3, 24 br i1 %cmp, label %bb3, label %exit bb3: call void @bar() nounwind br label %exit exit: ret void } This compiles to the followings before: push {lr} mov r2, #0 cmp r1, #99 bhi LBB0_2 @ BB#1: @ %bb1 ldrh r2, [r0] LBB0_2: @ %bb2 uxth r0, r2 cmp r0, #23 bhi LBB0_4 @ BB#3: @ %bb3 bl _bar LBB0_4: @ %exit pop {lr} bx lr The uxth is not needed since ldrh implicitly zero-extend the high bits. With this change it's eliminated. rdar://12771555 llvm-svn: 169459
*	Appease GCC's -Wparentheses.	Matt Beaumont-Gay	2012-12-04	1	-2/+2
\| \| \| \| \| \|	(TIL that Clang's -Wparentheses ignores 'x \|\| y && "foo"' on purpose. Neat.) llvm-svn: 169337
*	ARM custom lower ctpop for vector types. Patch by Pete Couperus.	Evan Cheng	2012-12-04	1	-0/+117
\| \| \| \|	llvm-svn: 169325
*	Use the new script to sort the includes of every file under lib.	Chandler Carruth	2012-12-03	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131
*	Codegen failure for vmull with small vectors	Sebastian Pop	2012-11-30	1	-13/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Codegen was failing with an assertion because of unexpected vector operands when legalizing the selection DAG for a MUL instruction. The asserting code was legalizing multiplies for vectors of size 128 bits. It uses a custom lowering to try and detect cases where it can use a VMULL instruction instead of a VMOVL + VMUL. The code was looking for input operands to the MUL that had been sign or zero extended. If it found the extended operands it would drop the sign/zero extension and use the original vector size as input to a VMULL instruction. The code assumed that the original input vector was 64 bits so that after dropping the extension it would fit directly into a D register and could be used as an operand of a VMULL instruction. The input code that trigger the failure used a vector of <4 x i8> that was sign extended to <4 x i32>. It was not safe to drop the sign extension in this case because the original vector is only 32 bits wide. The fix is to insert a sign extension for the vector to reach the required 64 bit size. In this particular example, the vector would need to be sign extented to a <4 x i16>. llvm-svn: 169024
*	Added atomic 64 min/max/umin/umax instrinsics support in the ARM backend.	Silviu Baranga	2012-11-29	1	-10/+73
\| \| \| \|	llvm-svn: 168886
*	ARM: Implement CanLowerReturn so large vectors get expanded into sret.	Benjamin Kramer	2012-11-28	1	-0/+11
\| \| \| \| \| \|	Fixes 14337. llvm-svn: 168809
*	Mark FP_EXTEND form v2f32 to v2f64 as "expand" for ARM NEON. Patch by Pete ↵	Eli Friedman	2012-11-17	1	-0/+1
\| \| \| \| \| \|	Couperus. llvm-svn: 168240
*	Remove hard coded registers in ARM ldrexd and strexd instructions	Weiming Zhao	2012-11-16	1	-11/+47
\| \| \| \| \| \| \| \| \|	This patch replaces the hard coded GPR pair [R0, R1] of Intrinsic:arm_ldrexd and [R2, R3] of Intrinsic:arm_strexd with even/odd GPRPair reg class. Similar to the lowering of atomic_64 operation. llvm-svn: 168207
*	Make sure FABS on v2f32 and v4f32 is legal on ARM NEON	Anton Korobeynikov	2012-11-16	1	-1/+0
\| \| \| \| \| \|	This fixes PR14359 llvm-svn: 168200
*	Mark FP_ROUND for converting NEON v2f64 to v2f32 as expand. Add a missing	Eli Friedman	2012-11-15	1	-0/+2
\| \| \| \| \| \| \| \|	case to vector legalization so this actually works. Patch by Pete Couperus. Fixes PR12540. llvm-svn: 168107
*	Revert changing FNEG of v4f32 to Expand. It's legal.	Craig Topper	2012-11-15	1	-1/+0
\| \| \| \|	llvm-svn: 168030
*	Make FNEG and FABS of v4f32 Expand.	Craig Topper	2012-11-15	1	-0/+2
\| \| \| \|	llvm-svn: 168029
*	Add llvm.ceil, llvm.trunc, llvm.rint, llvm.nearbyint intrinsics.	Craig Topper	2012-11-15	1	-0/+4
\| \| \| \|	llvm-svn: 168025
*	Disable the Thumb no-return call optimization:	Evan Cheng	2012-11-10	1	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \|	mov lr, pc b.w _foo The "mov" instruction doesn't set bit zero to one, it's putting incorrect value in lr. It messes up backtraces. rdar://12663632 llvm-svn: 167657
*	Revert r167620; this can be implemented using an existing CL option.	Chad Rosier	2012-11-09	1	-2/+1
\| \| \| \|	llvm-svn: 167622
*	Add support for -mstrict-align compiler option for ARM targets.	Chad Rosier	2012-11-09	1	-1/+2
\| \| \| \| \| \|	rdar://12340498 llvm-svn: 167620
*	Mark the Int_eh_sjlj_dispatchsetup pseudo instruction as clobbering all	Chad Rosier	2012-11-06	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \|	registers. Previously, the register we being marked as implicitly defined, but not killed. In some cases this would cause the register scavenger to spill a dead register. Also, use an empty register mask to simplify the logic and to reduce the memory footprint. rdar://12592448 llvm-svn: 167499
*	Vext Lowering was missing opportunities	Quentin Colombet	2012-11-02	1	-4/+40
\| \| \| \|	llvm-svn: 167318
*	Change ForceSizeOpt attribute into MinSize attribute	Quentin Colombet	2012-10-30	1	-4/+4
\| \| \| \|	llvm-svn: 167020
*	[code size][ARM] Emit regular call instructions instead of the move, branch ↵	Quentin Colombet	2012-10-27	1	-2/+8
\| \| \| \| \| \|	sequence llvm-svn: 166854
*	ARM:	Stepan Dyatkovskiy	2012-10-19	1	-11/+18
\| \| \| \| \| \| \| \|	Removed extra stack frame object for fixed byval arguments, VarArgsStyleRegisters invocation was reworked due to some improper usage in past. PR14099 also demonstrates it. llvm-svn: 166273
*	Issue:	Stepan Dyatkovskiy	2012-10-16	1	-10/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stack is formed improperly for long structures passed as byval arguments for EABI mode. If we took AAPCS reference, we can found the next statements: A: "If the argument requires double-word alignment (8-byte), the NCRN (Next Core Register Number) is rounded up to the next even register number." (5.5 Parameter Passing, Stage C, C.3). B: "The alignment of an aggregate shall be the alignment of its most-aligned component." (4.3 Composite Types, 4.3.1 Aggregates). So if we have structure with doubles (9 double fields) and 3 Core unused registers (r1, r2, r3): caller should use r2 and r3 registers only. Currently r1,r2,r3 set is used, but it is invalid. Callee VA routine should also use r2 and r3 regs only. All is ok here. This behaviour is guessed by rounding up SP address with ADD+BFC operations. Fix: Main fix is in ARMTargetLowering::HandleByVal. If we detected AAPCS mode and 8 byte alignment, we waste odd registers then. P.S.: I also improved LDRB_POST_IMM regression test. Since ldrb instruction will not generated by current regression test after this patch. llvm-svn: 166018
*	Fixed PR13938: the ARM backend was crashing because it couldn't select a ↵	Silviu Baranga	2012-10-15	1	-2/+19
\| \| \| \| \| \|	VDUPLANE node with the vector input size different from the output size. This was bacause the BUILD_VECTOR lowering code didn't check that the size of the input vector was correct for using VDUPLANE. llvm-svn: 165929
*	ARM: tail-call inside a function where part of a byval argument is on caller's	Manman Ren	2012-10-12	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	local frame causes problem. For example: void f(StructToPass s) { g(&s, sizeof(s)); } will cause problem with tail-call since part of s is passed via registers and saved in f's local frame. When g tries to access s, part of s may be corrupted since f's local frame is popped out before the tail-call. The current fix is to disable tail-call if getVarArgsRegSaveSize is not 0 for the caller. This is a conservative approach, if we can prove the address of s or part of s is not taken and passed to g, it should be okay to perform tail-call. rdar://12442472 llvm-svn: 165853
*	ARM: Mark VSELECT as 'expand'.	Jim Grosbach	2012-10-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The backend already pattern matches to form VBSL when it can. We may want to teach it to use the vbsl intrinsics at some point to prevent machine licm from mucking with this, but using the Expand is completely correct. http://llvm.org/bugs/show_bug.cgi?id=13831 http://llvm.org/bugs/show_bug.cgi?id=13961 Patch by Peter Couperus <peter.couperus@st.com>. llvm-svn: 165845
*	Fix for LDRB instruction:	Stepan Dyatkovskiy	2012-10-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SDNode for LDRB_POST_IMM is invalid: number of registers added to SDNode fewer that described in .td. 7 ops is needed, but SDNode with only 6 is created. In more details: In ARMInstrInfo.td, in multiclass AI2_ldridx, in definition _POST_IMM, offset operand is defined as am2offset_imm. am2offset_imm is complex parameter type, and actually it consists from dummy register and imm itself. As I understood trick with dummy reg was made for AsmParser. In ARMISelLowering.cpp, this dummy register was not added to SDNode, and it cause crash in Peephole Optimizer pass. The problem fixed by setting up additional dummy reg when emitting LDRB_POST_IMM instruction. llvm-svn: 165617
*	Issue description:	Stepan Dyatkovskiy	2012-10-10	1	-7/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack objects and byval parameters. So loading byval parameters from stack may be inserted before it will be stored, since these operations are treated as independent. Fix: Currently ARMTargetLowering::LowerFormalArguments saves byval registers with FixedStack MachinePointerInfo. To fix the problem we need to store byval registers with MachinePointerInfo referenced to first the "byval" parameter. Also commit adds two new fields to the InputArg structure: Function's argument index and InputArg's part offset in bytes relative to the start position of Function's argument. E.g.: If function's argument is 128 bit width and it was splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index, but different offset values. llvm-svn: 165616
*	Create enums for the different attributes.	Bill Wendling	2012-10-09	1	-2/+3
\| \| \| \| \| \| \|	We use the enums to query whether an Attributes object has that attribute. The opaque layer is responsible for knowing where that specific attribute is stored. llvm-svn: 165488
*	Move TargetData to DataLayout.	Micah Villmow	2012-10-08	1	-8/+8
\| \| \| \|	llvm-svn: 165402
*	Add LLVM support for Swift.	Bob Wilson	2012-09-29	1	-3/+3
\| \| \| \|	llvm-svn: 164899
*	Revert 'Fix a typo 'iff' => 'if''. iff is an abreviation of if and only if. ↵	Sylvestre Ledru	2012-09-27	1	-4/+4
\| \| \| \| \| \|	See: http://en.wikipedia.org/wiki/If_and_only_if Commit 164767 llvm-svn: 164768
*	Fix a typo 'iff' => 'if'	Sylvestre Ledru	2012-09-27	1	-4/+4
\| \| \| \|	llvm-svn: 164767
*	Remove the `hasFnAttr' method from Function.	Bill Wendling	2012-09-26	1	-2/+2
\| \| \| \| \| \| \|	The hasFnAttr method has been replaced by querying the Attributes explicitly. No intended functionality change. llvm-svn: 164725
*	Fix ordering of operands on lowering of atomicrmw min/max nodes on ARM.	James Molloy	2012-09-26	1	-2/+2
\| \| \| \|	llvm-svn: 164685
*	Use vld1 / vst2 for unaligned v2f64 load / store. e.g. Use vld1.16 for 2-byte	Evan Cheng	2012-09-18	1	-5/+9
\| \| \| \| \| \| \| \| \|	aligned address. Based on patch by David Peixotto. Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment hints. rdar://12090772, rdar://12238782 llvm-svn: 164089
*	This patch introduces A15 as a target in LLVM.	Silviu Baranga	2012-09-13	1	-1/+1
\| \| \| \|	llvm-svn: 163803
*	Set operation action for FFLOOR to Expand for all vector types for X86. Set ↵	Craig Topper	2012-09-08	1	-0/+1
\| \| \| \| \| \|	FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct. llvm-svn: 163458
*	Custom DAGCombine for and/or/xor are for all ARMs.	Jakob Stoklund Olesen	2012-09-07	1	-6/+3
\| \| \| \| \| \| \|	The 'select' transformations apply to all ARM architectures and don't require hasV6T2Ops. llvm-svn: 163396
*	Fix self-host; ensure signedness is consistent.	James Molloy	2012-09-06	1	-2/+2
\| \| \| \|	llvm-svn: 163306
*	Improve codegen for BUILD_VECTORs on ARM.	James Molloy	2012-09-06	1	-10/+56
\| \| \| \| \| \|	If we have a BUILD_VECTOR that is mostly a constant splat, it is often better to splat that constant then insertelement the non-constant lanes instead of insertelementing every lane from an undef base. llvm-svn: 163304
*	Patch to implement UMLAL/SMLAL instructions for the ARM architecture	Arnold Schwaighofer	2012-09-04	1	-0/+156
\| \| \| \| \| \| \| \| \| \| \|	This patch corrects the definition of umlal/smlal instructions and adds support for matching them to the ARM dag combiner. Bug 12213 Patch by Yin Ma! llvm-svn: 163136
*	Fix a couple of typos in EmitAtomic.	Jakob Stoklund Olesen	2012-08-31	1	-2/+2
\| \| \| \| \| \| \| \| \|	Thumb2 instructions are mostly constrained to rGPR, not tGPR which is for Thumb1. rdar://problem/12203728 llvm-svn: 162968
*	Use a SmallPtrSet to dedup successors in EmitSjLjDispatchBlock.	Jakob Stoklund Olesen	2012-08-20	1	-3/+2
\| \| \| \| \| \| \|	The test case ARM/2011-05-04-MultipleLandingPadSuccs.ll was creating duplicate successor list entries. llvm-svn: 162222
*	Remove the CAND/COR/CXOR custom ISD nodes and their select code.	Jakob Stoklund Olesen	2012-08-18	1	-51/+0
\| \| \| \| \| \| \|	These nodes are no longer needed because the peephole pass can fold CMOV+AND into ANDCC etc. llvm-svn: 162179
*	Also combine zext/sext into selects for ARM.	Jakob Stoklund Olesen	2012-08-18	1	-47/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This turns common i1 patterns into predicated instructions: (add (zext cc), x) -> (select cc (add x, 1), x) (add (sext cc), x) -> (select cc (add x, -1), x) For a function like: unsigned f(unsigned s, int x) { return s + (x>0); } We now produce: cmp r1, #0 it gt addgt.w r0, r0, #1 Instead of: movs r2, #0 cmp r1, #0 it gt movgt r2, #1 add r0, r2 llvm-svn: 162177
*	Also pass logical ops to combineSelectAndUse.	Jakob Stoklund Olesen	2012-08-18	1	-9/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add these transformations to the existing add/sub ones: (and (select cc, -1, c), x) -> (select cc, x, (and, x, c)) (or (select cc, 0, c), x) -> (select cc, x, (or, x, c)) (xor (select cc, 0, c), x) -> (select cc, x, (xor, x, c)) The selects can then be transformed to a single predicated instruction by peephole. This transformation will make it possible to eliminate the ISD::CAND, COR, and CXOR custom DAG nodes. llvm-svn: 162176