bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	ARM: fold prologue/epilogue sp updates into push/pop for code size	Tim Northover	2013-11-08	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ARM prologues usually look like: push {r7, lr} sub sp, sp, #4 If code size is extremely important, this can be optimised to the single instruction: push {r6, r7, lr} where we don't actually care about the contents of r6, but pushing it subtracts 4 from sp as a side effect. This should implement such a conversion, predicated on the "minsize" function attribute (-Oz) since I've yet to find any code it actually makes faster. llvm-svn: 194264
*	IfConverter: Use TargetSchedule for instruction latencies	Arnold Schwaighofer	2013-09-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For targets that have instruction itineraries this means no change. Targets that move over to the new schedule model will use be able the new schedule module for instruction latencies in the if-converter (the logic is such that if there is no itineary we will use the new sched model for the latencies). Before, we queried "TTI->getInstructionLatency()" for the instruction latency and the extra prediction cost. Now, we query the TargetSchedule abstraction for the instruction latency and TargetInstrInfo for the extra predictation cost. The TargetSchedule abstraction will internally call "TTI->getInstructionLatency" if an itinerary exists, otherwise it will use the new schedule model. ATTENTION: Out of tree targets! (I will also send out an email later to LLVMDev) This means, if your target implements unsigned getInstrLatency(const InstrItineraryData ItinData, const MachineInstr MI, unsigned PredCost); and returns a value for "PredCost", you now also need to implement unsigned getPredictationCost(const MachineInstr MI); (if your target uses the IfConversion.cpp pass) radar://15077010 llvm-svn: 191671
*	DebugInfo: remove target-specific Frame Index handling for DBG_VALUE ↵	David Blaikie	2013-06-16	1	-6/+0
\| \| \| \| \| \| \| \| \| \|	MachineInstrs Frame index handling is now target-agnostic, so delete the target hooks for creation & asm printing of target-specific addressing in DBG_VALUEs and any related functions. llvm-svn: 184067
*	Don't cache the instruction and register info from the TargetMachine, because	Bill Wendling	2013-06-07	1	-1/+1
\| \| \| \| \| \|	the internals of TargetMachine could change. llvm-svn: 183488
*	ARM: Use ldrd/strd to spill 64-bit pairs when available.	Tim Northover	2013-04-21	1	-0/+4
\| \| \| \| \| \| \|	This allows common sp-offsets to be part of the instruction and is probably faster on modern CPUs too. llvm-svn: 179977
*	ARM scheduler model: Swift has varying latencies, uops for simple ALU ops	Arnold Schwaighofer	2013-04-05	1	-0/+4
\| \| \| \|	llvm-svn: 178842
*	Sort includes for all of the .h files under the 'lib' tree. These were	Chandler Carruth	2012-12-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	missed in the first pass because the script didn't yet handle include guards. Note that the script is now able to handle all of these headers without manual edits. =] llvm-svn: 169224
*	misched: Use the TargetSchedModel interface wherever possible.	Andrew Trick	2012-10-10	1	-4/+0
\| \| \| \| \| \| \| \|	Allows the new machine model to be used for NumMicroOps and OutputLatency. Allows the HazardRecognizer to be disabled along with itineraries. llvm-svn: 165603
*	Add LLVM support for Swift.	Bob Wilson	2012-09-29	1	-0/+7
\| \| \| \|	llvm-svn: 164899
*	Whitespace.	Bob Wilson	2012-09-29	1	-1/+1
\| \| \| \|	llvm-svn: 164898
*	Implement getNumLDMAddresses and expose through ARMBaseInstrInfo.	Andrew Trick	2012-09-14	1	-0/+3
\| \| \| \|	llvm-svn: 163922
*	Handle ARM MOVCC optimization in PeepholeOptimizer.	Jakob Stoklund Olesen	2012-08-16	1	-0/+7
\| \| \| \| \| \|	Use the target independent select analysis hooks. llvm-svn: 162060
*	Fold predicable instructions into MOVCC / t2MOVCC.	Jakob Stoklund Olesen	2012-08-15	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ARM select instructions are just predicated moves. If the select is the only use of an operand, the instruction defining the operand can be predicated instead, saving one instruction and decreasing register pressure. This implementation can turn AND/ORR/EOR instructions into their corresponding ANDCC/ORRCC/EORCC variants. Ideally, we should be able to predicate any instruction, but we don't yet support predicated instructions in SSA form. llvm-svn: 161994
*	Add SrcReg2 to analyzeCompare and optimizeCompareInstr to handle Compare	Manman Ren	2012-06-29	1	-10/+14
\| \| \| \| \| \|	instructions with two register operands. llvm-svn: 159465
*	misched: API for minimum vs. expected latency.	Andrew Trick	2012-06-05	1	-2/+3
\| \| \| \| \| \| \|	Minimum latency determines per-cycle scheduling groups. Expected latency determines critical path and cost. llvm-svn: 158021
*	Implement ARMBaseInstrInfo::commuteInstruction() for MOVCCr.	Jakob Stoklund Olesen	2012-04-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	A MOVCCr instruction can be commuted by inverting the condition. This can help reduce register pressure and remove unnecessary copies in some cases. <rdar://problem/11182914> llvm-svn: 154033
*	ARM implement TargetInstrInfo::getNoopForMachoTarget()	Jim Grosbach	2012-02-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this hook, functions w/ a completely empty body (including no epilogue) will cause an MCEmitter assertion failure. For example, define internal fastcc void @empty_function() { unreachable } rdar://10947471 llvm-svn: 151673
*	Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, ↵	Jia Liu	2012-02-18	1	-1/+1
\| \| \| \| \| \|	MSP430, PPC, PTX, Sparc, X86, XCore. llvm-svn: 150878
*	Model ARM predicated write as read-mod-write. e.g.	Evan Cheng	2011-12-14	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	r0 = mov #0 r0 = moveq #1 Then the second instruction has an implicit data dependency on the first instruction. Sadly I have yet to come up with a small test case that demonstrate the post-ra scheduler taking advantage of this. llvm-svn: 146583
*	- Add MachineInstrBundle.h and MachineInstrBundle.cpp. This includes a function	Evan Cheng	2011-12-14	1	-4/+3
\| \| \| \| \| \| \| \| \| \|	to finalize MI bundles (i.e. add BUNDLE instruction and computing register def and use lists of the BUNDLE instruction) and a pass to unpack bundles. - Teach more of MachineBasic and MachineInstr methods to be bundle aware. - Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to prevent IT blocks from being broken apart. llvm-svn: 146542
*	Move -widen-vmovs to ARMBaseInstrInfo::expandPostRAPseudo().	Jakob Stoklund Olesen	2011-10-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	The VMOVS widening needs to look at the implicit COPY operands. Trying to dig out the COPY instruction from an iterator in copyPhysReg() is the wrong approach. The expandPostRAPseudo() hook gets to look at COPY instructions before they are converted to copyPhysReg() calls. llvm-svn: 141619
*	Implement TII::get/setExecutionDomain() for ARM.	Jakob Stoklund Olesen	2011-09-27	1	-0/+6
\| \| \| \|	llvm-svn: 140653
*	Lower ARM adds/subs to add/sub after adding optional CPSR operand.	Andrew Trick	2011-09-21	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is still a hack until we can teach tblgen to generate the optional CPSR operand rather than an implicit CPSR def. But the strangeness is now limited to the selection DAG. ADD/SUB MI's no longer have implicit CPSR defs, nor do we allow flag setting variants of these opcodes in machine code. There are several corner cases to consider, and getting one wrong would previously lead to nasty miscompilation. It's not the first time I've debugged one, so this time I added enough verification to ensure it won't happen again. llvm-svn: 140228
*	Implement isLoadFromStackSlotPostFE and isStoreToStackSlotPostFE for ARM.	Jakob Stoklund Olesen	2011-08-08	1	-0/+4
\| \| \| \| \| \|	They improve the verbose assembly. llvm-svn: 137069
*	Sink ARMMCExpr and ARMAddressingModes into MC layer. First step to separate ↵	Evan Cheng	2011-07-20	1	-140/+0
\| \| \| \| \| \|	ARM MC code from target. llvm-svn: 135636
*	Add a target-indepedent entry to MCInstrDesc to describe the encoded size of ↵	Owen Anderson	2011-07-13	1	-13/+5
\| \| \| \| \| \|	an opcode. Switch ARM over to using that rather than its own special MCInstrDesc bits. llvm-svn: 135106
*	Use BranchProbability instead of floating points in IfConverter.	Jakub Staszak	2011-07-10	1	-4/+4
\| \| \| \|	llvm-svn: 134858
*	Hide the call to InitMCInstrInfo into tblgen generated ctor.	Evan Cheng	2011-07-01	1	-1/+4
\| \| \| \|	llvm-svn: 134244
*	- Rename TargetInstrDesc, TargetOperandInfo to MCInstrDesc and MCOperandInfo and	Evan Cheng	2011-06-28	1	-6/+6
\| \| \| \| \| \| \| \|	sink them into MC layer. - Added MCInstrInfo, which captures the tablegen generated static data. Chang TargetInstrInfo so it's based off MCInstrInfo. llvm-svn: 134021
*	Clean up a few 80 column violations.	Jim Grosbach	2011-06-13	1	-2/+2
\| \| \| \|	llvm-svn: 132946
*	Fix a ton of comment typos found by codespell. Patch by	Chris Lattner	2011-04-15	1	-1/+1
\| \| \| \| \| \|	Luis Felipe Strano Moraes! llvm-svn: 129558
*	Fix a typo.	Cameron Zwarich	2011-04-13	1	-3/+3
\| \| \| \|	llvm-svn: 129429
*	Apply again changes to support ARM memory asm parsing. I removed	Bruno Cardoso Lopes	2011-03-31	1	-22/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	all LDR/STR changes and left them to a future patch. Passing all checks now. - Implement asm parsing support for LDRT, LDRBT, STRT, STRBT and fix the encoding wherever is possible. - Add a new encoding bit to describe the index mode used and teach printAddrMode2Operand to check by the addressing mode which index mode to print. - Testcases llvm-svn: 128689
*	Revert r128632 again, until I figure out what break the tests	Bruno Cardoso Lopes	2011-03-31	1	-2/+22
\| \| \| \|	llvm-svn: 128635
*	Reapply r128585 without generating a lib depedency cycle. An updated log:	Bruno Cardoso Lopes	2011-03-31	1	-22/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Implement asm parsing support for LDRT, LDRBT, STRT, STRBT and {STR,LDC}{2}_{PRE,POST} fixing the encoding wherever is possible. - Move all instructions which use am2offset without a pattern to use addrmode2. - Add a new encoding bit to describe the index mode used and teach printAddrMode2Operand to check by the addressing mode which index mode to print. - Testcases llvm-svn: 128632
*	Revert "- Implement asm parsing support for LDRT, LDRBT, STRT, STRBT and"	Matt Beaumont-Gay	2011-03-31	1	-1/+19
\| \| \| \| \| \|	This revision introduced a dependency cycle, as nlewycky mentioned by email. llvm-svn: 128597
*	- Implement asm parsing support for LDRT, LDRBT, STRT, STRBT and	Bruno Cardoso Lopes	2011-03-30	1	-19/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	{STR,LDC}{2}_PRE. - Fixed the encoding in some places. - Some of those instructions were using am2offset and now use addrmode2. Codegen isn't affected, instructions which use SelectAddrMode2Offset were not touched. - Teach printAddrMode2Operand to check by the addressing mode which index mode to print. - This is a work in progress, more work to come. The idea is to change places which use am2offset to use addrmode2 instead, as to unify assembly parser. - Add testcases for assembly parser llvm-svn: 128585
*	Preliminary support for ARM frame save directives emission via MI flags.	Anton Korobeynikov	2011-03-05	1	-4/+4
\| \| \| \| \| \| \|	This is just very first approximation how the stuff should be done (e.g. ARM-only for now). More to follow. llvm-svn: 127101
*	VFP single precision arith instructions can go down to NEON pipeline, but on ↵	Evan Cheng	2011-02-22	1	-1/+2
\| \| \| \| \| \|	Cortex-A8 only. llvm-svn: 126238
*	Sorry, several patches in one.	Evan Cheng	2011-01-20	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TargetInstrInfo: Change produceSameValue() to take MachineRegisterInfo as an optional argument. When in SSA form, targets can use it to make more aggressive equality analysis. Machine LICM: 1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead. 2. Fix a bug which prevent CSE of instructions which are not re-materializable. 3. Use improved form of produceSameValue. ARM: 1. Teach ARM produceSameValue to look pass some PIC labels. 2. Look for operands from different loads of different constant pool entries which have same values. 3. Re-implement PIC GA materialization using movw + movt. Combine the pair with a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible to re-materialize the instruction, allow machine LICM to hoist the set of instructions out of the loop and make it possible to CSE them. It's a bit hacky, but it significantly improve code quality. 4. Some minor bug fixes as well. With the fixes, using movw + movt to materialize GAs significantly outperform the load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap and 176.gcc ~10%. llvm-svn: 123905
*	Various bits of framework needed for precise machine-level selection	Andrew Trick	2010-12-24	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DAG scheduling during isel. Most new functionality is currently guarded by -enable-sched-cycles and -enable-sched-hazard. Added InstrItineraryData::IssueWidth field, currently derived from ARM itineraries, but could be initialized differently on other targets. Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is active, and if so how many cycles of state it holds. Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry into the scheduler's available queue. ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to get information about it's SUnits, provides RecedeCycle for bottom-up scheduling, correctly computes scoreboard depth, tracks IssueCount, and considers potential stall cycles when checking for hazards. ScheduleDAGRRList now models machine cycles and hazards (under flags). It tracks MinAvailableCycle, drives the hazard recognizer and priority queue's ready filter, manages a new PendingQueue, properly accounts for stall cycles, etc. llvm-svn: 122541
*	Making use of VFP / NEON floating point multiply-accumulate / subtraction is	Evan Cheng	2010-12-05	1	-1/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960
*	s/ARM::BRIND/ARM::BX/g to coincide with r120366.	Bill Wendling	2010-11-30	1	-1/+1
\| \| \| \|	llvm-svn: 120371
*	Move callee-saved regs spills / reloads to TFI	Anton Korobeynikov	2010-11-27	1	-19/+0
\| \| \| \|	llvm-svn: 120228
*	Rewrite stack callee saved spills and restores to use push/pop instructions.	Eric Christopher	2010-11-18	1	-0/+15
\| \| \| \| \| \| \| \| \|	Remove movePastCSLoadStoreOps and associated code for simple pointer increments. Update routines that depended upon other opcodes for save/restore. Adjust all testcases accordingly. llvm-svn: 119725
*	Remove ARM isel hacks that fold large immediates into a pair of add, sub, and,	Evan Cheng	2010-11-17	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and xor. The 32-bit move immediates can be hoisted out of loops by machine LICM but the isel hacks were preventing them. Instead, let peephole optimization pass recognize registers that are defined by immediates and the ARM target hook will fold the immediates in. Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ instructions if there are multiple uses. This happens when the 'and' is live out, machine sink would have sinked the computation and that ends up pessimizing code. The peephole pass would recognize situations where the 'and' can be toggled to define CPSR and eliminate the comparison anyway. 2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking important optimizations. rdar://8663787, rdar://8241368 llvm-svn: 119548
*	Code clean up. The peephole pass should be the one updating the instruction	Evan Cheng	2010-11-15	1	-2/+1
\| \| \| \| \| \|	iterator, not TII->OptimizeCompareInstr. llvm-svn: 119186
*	Revert this temporarily.	Eric Christopher	2010-11-11	1	-5/+0
\| \| \| \|	llvm-svn: 118827
*	Change the prologue and epilogue to use push/pop for the low ARM registers.	Eric Christopher	2010-11-11	1	-0/+5
\| \| \| \|	llvm-svn: 118823
*	Two sets of changes. Sorry they are intermingled.	Evan Cheng	2010-11-03	1	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to "optimize for latency". Call instructions don't have the right latency and this is more likely to use introduce spills. 2. Fix if-converter cost function. For ARM, it should use instruction latencies, not # of micro-ops since multi-latency instructions is completely executed even when the predicate is false. Also, some instruction will be "slower" when they are predicated due to the register def becoming implicit input. rdar://8598427 llvm-svn: 118135