bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	whitespace	Andrew Trick	2010-12-24	1	-1/+1
\| \| \| \|	llvm-svn: 122539
*	Remove the rest of the *_sfp Neon instruction patterns.	Bob Wilson	2010-12-13	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions. This change made a big difference in the code generated for the CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing a fine job, but some instructions that were previously moved outside the loop are not moved now. It's using fewer VFP registers now, which is generally a good thing, so I think the estimates for register pressure changed and that affected the LICM behavior. Since that isn't obviously wrong, I've just changed the test file. This completes the work for Radar 8711675. llvm-svn: 121730
*	Refactor the ARM CMPz* patterns to just use the normal CMP instructions when	Jim Grosbach	2010-12-07	1	-2/+0
\| \| \| \| \| \| \|	possible. They were duplicates for everything exception the source pattern before. llvm-svn: 121179
*	Making use of VFP / NEON floating point multiply-accumulate / subtraction is	Evan Cheng	2010-12-05	1	-1/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960
*	Rename t2 TBB and TBH instructions to reference that they encode the jump table	Jim Grosbach	2010-11-29	1	-5/+5
\| \| \| \| \| \|	data. Next up, pseudo-izing them. llvm-svn: 120320
*	Move callee-saved regs spills / reloads to TFI	Anton Korobeynikov	2010-11-27	1	-122/+0
\| \| \| \|	llvm-svn: 120228
*	Rewrite stack callee saved spills and restores to use push/pop instructions.	Eric Christopher	2010-11-18	1	-19/+105
\| \| \| \| \| \| \| \| \|	Remove movePastCSLoadStoreOps and associated code for simple pointer increments. Update routines that depended upon other opcodes for save/restore. Adjust all testcases accordingly. llvm-svn: 119725
*	Silence compiler warnings.	Evan Cheng	2010-11-18	1	-2/+2
\| \| \| \|	llvm-svn: 119610
*	Remove ARM isel hacks that fold large immediates into a pair of add, sub, and,	Evan Cheng	2010-11-17	1	-0/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and xor. The 32-bit move immediates can be hoisted out of loops by machine LICM but the isel hacks were preventing them. Instead, let peephole optimization pass recognize registers that are defined by immediates and the ARM target hook will fold the immediates in. Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ instructions if there are multiple uses. This happens when the 'and' is live out, machine sink would have sinked the computation and that ends up pessimizing code. The peephole pass would recognize situations where the 'and' can be toggled to define CPSR and eliminate the comparison anyway. 2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking important optimizations. rdar://8663787, rdar://8241368 llvm-svn: 119548
*	Simplify code that toggle optional operand to ARM::CPSR.	Evan Cheng	2010-11-17	1	-3/+3
\| \| \| \|	llvm-svn: 119484
*	Encode the multi-load/store instructions with their respective modes ('ia',	Bill Wendling	2010-11-16	1	-80/+135
\| \| \| \| \| \| \| \| \|	'db', 'ib', 'da') instead of having that mode as a separate field in the instruction. It's more convenient for the asm parser and much more readable for humans. <rdar://problem/8654088> llvm-svn: 119310
*	Code clean up. The peephole pass should be the one updating the instruction	Evan Cheng	2010-11-15	1	-5/+2
\| \| \| \| \| \|	iterator, not TII->OptimizeCompareInstr. llvm-svn: 119186
*	Revert this temporarily.	Eric Christopher	2010-11-11	1	-53/+8
\| \| \| \|	llvm-svn: 118827
*	Change the prologue and epilogue to use push/pop for the low ARM registers.	Eric Christopher	2010-11-11	1	-8/+53
\| \| \| \|	llvm-svn: 118823
*	Two sets of changes. Sorry they are intermingled.	Evan Cheng	2010-11-03	1	-38/+62
\| \| \| \| \| \| \| \| \| \| \| \| \|	1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to "optimize for latency". Call instructions don't have the right latency and this is more likely to use introduce spills. 2. Fix if-converter cost function. For ARM, it should use instruction latencies, not # of micro-ops since multi-latency instructions is completely executed even when the predicate is false. Also, some instruction will be "slower" when they are predicated due to the register def becoming implicit input. rdar://8598427 llvm-svn: 118135
*	When we look at instructions to convert to setting the 's' flag, we need to look	Bill Wendling	2010-11-01	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	at more than those which define CPSR. You can have this situation: (1) subs ... (2) sub r6, r5, r4 (3) movge ... (4) cmp r6, 0 (5) movge ... We cannot convert (2) to "subs" because (3) is using the CPSR set by (1). There's an analogous situation here: (1) sub r1, r2, r3 (2) sub r4, r5, r6 (3) cmp r4, ... (5) movge ... (6) cmp r1, ... (7) movge ... We cannot convert (1) to "subs" because of the intervening use of CPSR. llvm-svn: 117950
*	Fix fpscr <-> GPR latency info.	Evan Cheng	2010-10-29	1	-2/+9
\| \| \| \|	llvm-svn: 117737
*	Avoiding overly aggressive latency scheduling. If the two nodes share an	Evan Cheng	2010-10-29	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operand and one of them has a single use that is a live out copy, favor the one that is live out. Otherwise it will be difficult to eliminate the copy if the instruction is a loop induction variable update. e.g. BB: sub r1, r3, #1 str r0, [r2, r3] mov r3, r1 cmp bne BB => BB: str r0, [r2, r3] sub r3, r3, #1 cmp bne BB This fixed the recent 256.bzip2 regression. llvm-svn: 117675
*	Re-commit 117518 and 117519 now that ARM MC test failures are out of the way.	Evan Cheng	2010-10-28	1	-5/+67
\| \| \| \|	llvm-svn: 117531
*	Revert 117518 and 117519 for now. They changed scheduling and cause MC tests ↵	Evan Cheng	2010-10-28	1	-67/+5
\| \| \| \| \| \|	to fail. Ugh. llvm-svn: 117520
*	- Assign load / store with shifter op address modes the right itinerary classes.	Evan Cheng	2010-10-28	1	-5/+67
\| \| \| \| \| \| \| \| \| \|	- For now, loads of [r, r] addressing mode is the same as the [r, r lsl/lsr/asr #] variants. ARMBaseInstrInfo::getOperandLatency() should identify the former case and reduce the output latency by 1. - Also identify [r, r << 2] case. This special form of shifter addressing mode is "free". llvm-svn: 117519
*	Refactor ARM STR/STRB instruction patterns into STR{B}i12 and STR{B}rs, like	Jim Grosbach	2010-10-27	1	-3/+4
\| \| \| \| \| \| \| \|	the LDR instructions have. This makes the literal/register forms of the instructions explicit and allows us to assign scheduling itineraries appropriately. rdar://8477752 llvm-svn: 117505
*	The immediate operands of an LDRi12 instruction doesn't need the addrmode2	Jim Grosbach	2010-10-27	1	-2/+6
\| \| \| \| \| \|	encoding tricks. Handle the 'imm doesn't fit in the insn' case. llvm-svn: 117454
*	LDRi12 machine instructions handle negative offset operands normally (simple	Jim Grosbach	2010-10-27	1	-2/+9
\| \| \| \| \| \|	integer values), not with the addrmode2 encoding. llvm-svn: 117429
*	Split ARM::LDRB into LDRBi12 and LDRBrs. Adjust accordingly. Continuing on	Jim Grosbach	2010-10-27	1	-2/+2
\| \| \| \| \| \|	rdar://8477752. llvm-svn: 117419
*	First part of refactoring ARM addrmode2 (load/store) instructions to be more	Jim Grosbach	2010-10-26	1	-7/+14
\| \| \| \| \| \| \| \|	explicit about the operands. Split out the different variants into separate instructions. This gives us the ability to, among other things, assign different scheduling itineraries to the variants. rdar://8477752. llvm-svn: 117409
*	Use instruction itinerary to determine what instructions are 'cheap'.	Evan Cheng	2010-10-26	1	-0/+15
\| \| \| \|	llvm-svn: 117348
*	Move the remaining attribute macros to systematic names based on the attribute	Chandler Carruth	2010-10-23	1	-1/+1
\| \| \| \| \| \|	name and prefixed with 'LLVM_'. llvm-svn: 117203
*	Latency between CPSR def and branch is zero.	Evan Cheng	2010-10-23	1	-0/+6
\| \| \| \|	llvm-svn: 117192
*	Re-enable register pressure aware machine licm with fixes. Hoist() may have	Evan Cheng	2010-10-19	1	-0/+20
\| \| \| \| \| \| \|	erased the instruction during LICM so UpdateRegPressureAfter() should not reference it afterwards. llvm-svn: 116845
*	Revert r116781 "- Add a hook for target to determine whether an instruction def	Daniel Dunbar	2010-10-19	1	-20/+0
\| \| \| \| \| \|	is", which breaks some nightly tests. llvm-svn: 116816
*	- Add a hook for target to determine whether an instruction def is	Evan Cheng	2010-10-19	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \|	"long latency" enough to hoist even if it may increase spilling. Reloading a value from spill slot is often cheaper than performing an expensive computation in the loop. For X86, that means machine LICM will hoist SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON instructions. - Enable register pressure aware machine LICM by default. llvm-svn: 116781
*	Don't recompute MachineRegisterInfo in the Optimize* method.	Bill Wendling	2010-10-18	1	-6/+6
\| \| \| \|	llvm-svn: 116750
*	Check to make sure that the iterator isn't at the beginning of the basic block	Bill Wendling	2010-10-09	1	-0/+4
\| \| \| \| \| \|	before decrementing. <rdar://problem/8529919> llvm-svn: 116126
*	Code refactoring.	Evan Cheng	2010-10-07	1	-104/+144
\| \| \| \|	llvm-svn: 116002
*	Model operand cycles of vldm / vstm; also fixes scheduling itineraries of ↵	Evan Cheng	2010-10-07	1	-8/+83
\| \| \| \| \| \|	vldr / vstr, etc. llvm-svn: 115898
*	Clean up MOVi32imm and t2MOVi32imm pseudo instruction definitions.	Jim Grosbach	2010-10-06	1	-0/+3
\| \| \| \|	llvm-svn: 115853
*	- Add TargetInstrInfo::getOperandLatency() to compute operand latencies. This	Evan Cheng	2010-10-06	1	-0/+161
\| \| \| \| \| \| \| \| \| \| \| \| \|	allow target to correctly compute latency for cases where static scheduling itineraries isn't sufficient. e.g. variable_ops instructions such as ARM::ldm. This also allows target without scheduling itineraries to compute operand latencies. e.g. X86 can return (approximated) latencies for high latency instructions such as division. - Compute operand latencies for those defined by load multiple instructions, e.g. ldm and those used by store multiple instructions, e.g. stm. llvm-svn: 115755
*	fix MSVC 2010 build.	Michael J. Spencer	2010-10-05	1	-1/+2
\| \| \| \|	llvm-svn: 115594
*	Cleanup Whitespace.	Michael J. Spencer	2010-10-05	1	-11/+11
\| \| \| \|	llvm-svn: 115593
*	Thread the determination of branch prediction hit rates back through the ↵	Owen Anderson	2010-10-01	1	-4/+5
\| \| \| \| \| \| \| \| \|	if-conversion heuristic APIs. For now, stick with a constant estimate of 90% (branch predictors are good!), but we might find that we want to provide more nuanced estimates in the future. llvm-svn: 115364
*	Make the spelling of the flags for old-style if-conversion heuristics ↵	Owen Anderson	2010-10-01	1	-4/+4
\| \| \| \| \| \|	consistent between ARM and Thumb2. llvm-svn: 115341
*	Temporarily add a flag to make it easier to compare the new-style ARM if	Owen Anderson	2010-09-30	1	-0/+19
\| \| \| \| \| \|	conversion heuristics to the old-style ones. llvm-svn: 115239
*	improve heuristics to find the 'and' corresponding to 'tst' to also catch ↵	Gabor Greif	2010-09-29	1	-8/+20
\| \| \| \| \| \| \| \|	opportunities on thumb2 added some doxygen on the way llvm-svn: 115033
*	Add a subtarget hook for reporting the misprediction penalty. Use this to ↵	Owen Anderson	2010-09-28	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	provide more precise cost modeling for if-conversion. Now if only we had a way to estimate the misprediction probability. Adjsut CodeGen/ARM/ifcvt10.ll. The pipeline on Cortex-A8 is long enough that it is still profitable to predicate an ldm, but the shorter pipeline on Cortex-A9 makes it unprofitable. llvm-svn: 114995
*	Part one of switching to using a more sane heuristic for determining ↵	Owen Anderson	2010-09-28	1	-10/+24
\| \| \| \| \| \| \| \| \| \| \|	if-conversion profitability. Rather than having arbitrary cutoffs, actually try to cost model the conversion. For now, the constants are tuned to more or less match our existing behavior, but these will be changed to reflect realistic values as this work proceeds. llvm-svn: 114973
*	80-col fixups.	Eric Christopher	2010-09-28	1	-1/+2
\| \| \| \|	llvm-svn: 114943
*	Fix r114632. Return if the only terminator is an unconditional branch after ↵	Evan Cheng	2010-09-23	1	-3/+5
\| \| \| \| \| \|	the redundant ones are deleted. llvm-svn: 114688
*	If there are multiple unconditional branches terminating a block, eliminate all	Evan Cheng	2010-09-23	1	-1/+17
\| \| \| \| \| \| \|	but the first one. Those will never be executed. There was logic to do this but it was faulty. llvm-svn: 114632
*	OptimizeCompareInstr should avoid iterating pass the beginning of the MBB ↵	Evan Cheng	2010-09-21	1	-1/+6
\| \| \| \| \| \|	when the 'and' instruction is after the comparison. llvm-svn: 114506