bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Split up the ARM so_reg ComplexPattern into so_reg_reg and so_reg_imm, ↵	Owen Anderson	2011-07-21	1	-15/+52
\| \| \| \| \| \|	allowing us to distinguish the encodings that use shifted registers from those that use shifted immediates. This is necessary to allow the fixed-length decoder to distinguish things like BICS vs LDRH. llvm-svn: 135693
*	Sink ARMMCExpr and ARMAddressingModes into MC layer. First step to separate ↵	Evan Cheng	2011-07-20	1	-10/+12
\| \| \| \| \| \|	ARM MC code from target. llvm-svn: 135636
*	- Rename TargetInstrDesc, TargetOperandInfo to MCInstrDesc and MCOperandInfo and	Evan Cheng	2011-06-28	1	-3/+3
\| \| \| \| \| \| \| \|	sink them into MC layer. - Added MCInstrInfo, which captures the tablegen generated static data. Chang TargetInstrInfo so it's based off MCInstrInfo. llvm-svn: 134021
*	Change the REG_SEQUENCE SDNode to take an explict register class ID as its ↵	Owen Anderson	2011-06-16	1	-12/+23
\| \| \| \| \| \| \| \|	first operand. This operand is lowered away by the time we reach MachineInstrs, so the actual register-allocation handling of them doesn't need to change. This is intended to support using REG_SEQUENCE SDNode's with type MVT::untyped, and is part of the long road to eliminating some of the hacks we currently use to support register pairs and other strange constraints, particularly on ARM NEON. llvm-svn: 133178
*	Add support for ARM ldrexd/strexd intrinsics. They both use i32 register pairs	Bruno Cardoso Lopes	2011-05-28	1	-0/+105
\| \| \| \| \| \| \| \|	to load/store i64 values. Since there's no current support to explicitly declare such restrictions, implement it by using specific hardcoded register pairs during isel. llvm-svn: 132248
*	Zap a couple now-unused functions.	Eli Friedman	2011-04-29	1	-10/+0
\| \| \| \|	llvm-svn: 130557
*	This patch combines several changes from Evan Cheng for rdar://8659675.	Bob Wilson	2011-04-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Making use of VFP / NEON floating point multiply-accumulate / subtraction is difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Enable these fp vmlx codegen changes for Cortex-A9. llvm-svn: 129775
*	Do not lose mem_operands while lowering VLD / VST intrinsics.	Evan Cheng	2011-04-19	1	-4/+29
\| \| \| \|	llvm-svn: 129738
*	Reduce code duplication.	Owen Anderson	2011-03-18	1	-31/+13
\| \| \| \|	llvm-svn: 127899
*	Generate a VTBL instruction instead of a series of loads and stores when we	Bill Wendling	2011-03-14	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	can. As Nate pointed out, VTBL isn't super performant, but it has to be better than this: _shuf: @ BB#0: @ %entry push {r4, r7, lr} add r7, sp, #4 sub sp, #12 mov r4, sp bic r4, r4, #7 mov sp, r4 mov r2, sp vmov d16, r0, r1 orr r0, r2, #6 orr r3, r2, #7 vst1.8 {d16[0]}, [r3] vst1.8 {d16[5]}, [r0] subs r4, r7, #4 orr r0, r2, #5 vst1.8 {d16[4]}, [r0] orr r0, r2, #4 vst1.8 {d16[4]}, [r0] orr r0, r2, #3 vst1.8 {d16[0]}, [r0] orr r0, r2, #2 vst1.8 {d16[2]}, [r0] orr r0, r2, #1 vst1.8 {d16[1]}, [r0] vst1.8 {d16[3]}, [r2] vldr.64 d16, [sp] vmov r0, r1, d16 mov sp, r4 pop {r4, r7, pc} The "illegal" testcase in vext.ll is no longer illegal. <rdar://problem/9078775> llvm-svn: 127630
*	Remove dead code. These ARM instruction definitions no longer exist.	Jim Grosbach	2011-03-11	1	-1/+1
\| \| \| \|	llvm-svn: 127509
*	Remove unused conditional negate operations.	Bob Wilson	2011-03-05	1	-28/+0
\| \| \| \|	llvm-svn: 127090
*	Add patterns to use post-increment addressing for Neon VST1-lane instructions.	Bob Wilson	2011-02-25	1	-0/+15
\| \| \| \|	llvm-svn: 126477
*	Enhance ComputeMaskedBits to know that aligned frameindexes	Chris Lattner	2011-02-13	1	-40/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	have their low bits set to zero. This allows us to optimize out explicit stack alignment code like in stack-align.ll:test4 when it is redundant. Doing this causes the code generator to start turning FI+cst into FI\|cst all over the place, which is general goodness (that is the canonical form) except that various pieces of the code generator don't handle OR aggressively. Fix this by introducing a new SelectionDAG::isBaseWithConstantOffset predicate, and using it in places that are looking for ADD(X,CST). The ARM backend in particular was missing a lot of addressing mode folding opportunities around OR. llvm-svn: 125470
*	Add codegen support for using post-increment NEON load/store instructions.	Bob Wilson	2011-02-07	1	-144/+348
\| \| \| \| \| \| \| \|	The vld1-lane, vld1-dup and vst1-lane instructions do not yet support using post-increment versions, but all the rest of the NEON load/store instructions should be handled now. llvm-svn: 125014
*	Change VLD3/4 and VST3/4 for quad registers to not update the address register.	Bob Wilson	2011-02-07	1	-60/+44
\| \| \| \| \| \| \| \| \| \| \| \|	These operations are expanded to pairs of loads or stores, and the first one uses the address register update to produce the address for the second one. So far, the second load/store has also updated the address register, just for convenience, since that output has never been used. In anticipation of actually supporting post-increment updates for these operations, this changes the non-updating operations to use a non-updating load/store for the second instruction. llvm-svn: 125013
*	Sorry, several patches in one.	Evan Cheng	2011-01-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TargetInstrInfo: Change produceSameValue() to take MachineRegisterInfo as an optional argument. When in SSA form, targets can use it to make more aggressive equality analysis. Machine LICM: 1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead. 2. Fix a bug which prevent CSE of instructions which are not re-materializable. 3. Use improved form of produceSameValue. ARM: 1. Teach ARM produceSameValue to look pass some PIC labels. 2. Look for operands from different loads of different constant pool entries which have same values. 3. Re-implement PIC GA materialization using movw + movt. Combine the pair with a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible to re-materialize the instruction, allow machine LICM to hoist the set of instructions out of the loop and make it possible to CSE them. It's a bit hacky, but it significantly improve code quality. 4. Some minor bug fixes as well. With the fixes, using movw + movt to materialize GAs significantly outperform the load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap and 176.gcc ~10%. llvm-svn: 123905
*	ARM/ISel: Factor out isScaledConstantInRange() helper.	Daniel Dunbar	2011-01-19	1	-122/+110
\| \| \| \|	llvm-svn: 123823
*	Materialize GA addresses with movw + movt pairs for Darwin in PIC mode. e.g.	Evan Cheng	2011-01-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	movw r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4)) movt r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4)) LPC0_0: add r0, pc, r0 It's not yet enabled by default as some tests are failing. I suspect bugs in down stream tools. llvm-svn: 123619
*	Model operand restrictions of mul-like instructions on ARMv5 via	Anton Korobeynikov	2011-01-01	1	-2/+6
\| \| \| \| \| \| \| \| \|	earlyclobber stuff. This should fix PRs 2313 and 8157. Unfortunately, no testcase, since it'd be dependent on register assignments. llvm-svn: 122663
*	whitespace	Andrew Trick	2010-12-24	1	-1/+1
\| \| \| \|	llvm-svn: 122539
*	rename MVT::Flag to MVT::Glue. "Flag" is a terrible name for	Chris Lattner	2010-12-21	1	-1/+1
\| \| \| \| \| \| \|	something that just glues two nodes together, even if it is sometimes used for flags. llvm-svn: 122310
*	Use PairDRegs to implement ConcatVectors. No functionality change.	Bob Wilson	2010-12-17	1	-7/+1
\| \| \| \|	llvm-svn: 122017
*	Thumb1 had two patterns for the same load-from-constant-pool instruction.	Jim Grosbach	2010-12-15	1	-1/+1
\| \| \| \| \| \|	Canonicalize on tLDRpci and remove tLDRcp. llvm-svn: 121920
*	Reapply r121808 now that the missing patterns have been supplied.	Bill Wendling	2010-12-15	1	-16/+21
\| \| \| \|	llvm-svn: 121820
*	Revert r121808 until I can fix the build.	Bill Wendling	2010-12-15	1	-21/+16
\| \| \| \|	llvm-svn: 121815
*	Make the ISel selections for LDR/STR the same as before the LDRr/LDRi split. In	Bill Wendling	2010-12-14	1	-16/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	particular, we want ldr r2, [r3] to be equivalent to ldr r2, [r3, #0] and not ldr r2, [r3, r0] llvm-svn: 121808
*	The tLDR et al instructions were emitting either a reg/reg or reg/imm	Bill Wendling	2010-12-14	1	-31/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instruction based on the t_addrmode_s# mode and what it returned. There is some obvious badness to this. In particular, it's hard to do MC-encoding when the instruction may change out from underneath you after the t_addrmode_s# variable is finally resolved. The solution is to revert a long-ago change that merged the reg/reg and reg/imm versions. There is the addition of several new addressing modes. They no longer have extraneous operands associated with them. I.e., if it's reg/reg we don't have to have a dummy zero immediate tacked on to the SDNode. There are some obvious cleanups here, which will happen shortly. llvm-svn: 121747
*	Fix some invalid alignments for Neon vld-dup and vld/st-lane instructions.	Bob Wilson	2010-12-10	1	-0/+4
\| \| \| \| \| \| \|	Alignments smaller than the total size of the memory being loaded or stored, unless the alignment is 8 bytes, are not allowed. Add tests for this, too. llvm-svn: 121506
*	Making use of VFP / NEON floating point multiply-accumulate / subtraction is	Evan Cheng	2010-12-05	1	-1/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960
*	Add support for NEON VLD3-dup instructions.	Bob Wilson	2010-11-30	1	-0/+6
\| \| \| \| \| \|	The encoding for alignment in VLD4-dup instructions is still a work in progress. llvm-svn: 120356
*	Add support for NEON VLD3-dup instructions.	Bob Wilson	2010-11-29	1	-0/+6
\| \| \| \|	llvm-svn: 120312
*	Add support for NEON VLD2-dup instructions.	Bob Wilson	2010-11-28	1	-0/+67
\| \| \| \|	llvm-svn: 120236
*	Fix a cut-n-paste-error.	Evan Cheng	2010-11-19	1	-1/+1
\| \| \| \|	llvm-svn: 119866
*	Avoid isel movcc of large immediates when the large immediate is available ↵	Evan Cheng	2010-11-17	1	-10/+14
\| \| \| \| \| \|	in a register. These immediates aren't free. llvm-svn: 119558
*	Add conditional move of large immediate.	Evan Cheng	2010-11-13	1	-14/+26
\| \| \| \|	llvm-svn: 118968
*	Fix an obvious typo which inverted an immediate.	Evan Cheng	2010-11-13	1	-1/+1
\| \| \| \|	llvm-svn: 118951
*	Add conditional mvn instructions.	Evan Cheng	2010-11-12	1	-5/+31
\| \| \| \|	llvm-svn: 118935
*	Simplify uses of MVT and EVT. An MVT can be compared directly	Duncan Sands	2010-11-03	1	-3/+2
\| \| \| \| \| \| \|	with a SimpleValueType, while an EVT supports equality and inequality comparisons with SimpleValueType. llvm-svn: 118169
*	Break ARM addrmode4 (load/store multiple base address) into its constituent	Jim Grosbach	2010-11-03	1	-7/+0
\| \| \| \| \| \| \|	parts. Represent the operation mode as an optional operand instead. rdar://8614429 llvm-svn: 118137
*	Add support for alignment operands on VLD1-lane instructions.	Bob Wilson	2010-11-01	1	-24/+45
\| \| \| \| \| \|	This is another part of the fix for Radar 8599955. llvm-svn: 117976
*	Shifter ops are not always free. Do not fold them (especially to form	Evan Cheng	2010-10-27	1	-14/+109
\| \| \| \| \| \| \|	complex load / store addressing mode) when they have higher cost and when they have more than one use. llvm-svn: 117509
*	First part of refactoring ARM addrmode2 (load/store) instructions to be more	Jim Grosbach	2010-10-26	1	-2/+137
\| \| \| \| \| \| \| \|	explicit about the operands. Split out the different variants into separate instructions. This gives us the ability to, among other things, assign different scheduling itineraries to the variants. rdar://8477752. llvm-svn: 117409
*	trailing whitespace	Jim Grosbach	2010-10-21	1	-4/+4
\| \| \| \|	llvm-svn: 117050
*	Support alignment for NEON vld-lane and vst-lane instructions.	Bob Wilson	2010-10-19	1	-0/+11
\| \| \| \|	llvm-svn: 116776
*	Allow use of the 16-bit literal move instruction in CMOVs for Thumb2 mode.	Jim Grosbach	2010-10-07	1	-7/+9
\| \| \| \|	llvm-svn: 115890
*	Allow use of the 16-bit literal move instruction in CMOVs for ARM mode.	Jim Grosbach	2010-10-07	1	-8/+10
\| \| \| \|	llvm-svn: 115884
*	Add specializations of addrmode2 that allow differentiating those forms	Jim Grosbach	2010-09-29	1	-9/+33
\| \| \| \| \| \| \| \| \|	which require the use of the shifter-operand. This will be used to split the ldr/str instructions such that those versions needing the shifter operand can get a different scheduling itenerary, as in some cases, the use of the shifter can cause different scheduling than the simpler forms. llvm-svn: 115066
*	Add braces for legibility.	Jim Grosbach	2010-09-29	1	-1/+2
\| \| \| \|	llvm-svn: 115043
*	Set alignment operand for NEON VST instructions.	Bob Wilson	2010-09-23	1	-14/+22
\| \| \| \|	llvm-svn: 114709