summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/Thumb2InstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Preliminary support for ARM frame save directives emission via MI flags.Anton Korobeynikov2011-03-051-8/+11
| | | | | | | This is just very first approximation how the stuff should be done (e.g. ARM-only for now). More to follow. llvm-svn: 127101
* Guard against de-referencing MBB.end().Evan Cheng2011-02-221-1/+4
| | | | llvm-svn: 126192
* Skipping over debugvalue instructions to determine whether the split spot is ↵Evan Cheng2011-02-211-0/+3
| | | | | | in a IT block. rdar://9030770 llvm-svn: 126159
* Making use of VFP / NEON floating point multiply-accumulate / subtraction isEvan Cheng2010-12-051-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960
* Two sets of changes. Sorry they are intermingled.Evan Cheng2010-11-031-27/+0
| | | | | | | | | | | | | 1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to "optimize for latency". Call instructions don't have the right latency and this is more likely to use introduce spills. 2. Fix if-converter cost function. For ARM, it should use instruction latencies, not # of micro-ops since multi-latency instructions is completely executed even when the predicate is false. Also, some instruction will be "slower" when they are predicated due to the register def becoming implicit input. rdar://8598427 llvm-svn: 118135
* Thread the determination of branch prediction hit rates back through the ↵Owen Anderson2010-10-011-4/+7
| | | | | | | | | if-conversion heuristic APIs. For now, stick with a constant estimate of 90% (branch predictors are good!), but we might find that we want to provide more nuanced estimates in the future. llvm-svn: 115364
* Provide an option to restore old-style if-conversion heuristics for Thumb2.Owen Anderson2010-10-011-0/+29
| | | | llvm-svn: 115339
* Part one of switching to using a more sane heuristic for determining ↵Owen Anderson2010-09-281-25/+0
| | | | | | | | | | | if-conversion profitability. Rather than having arbitrary cutoffs, actually try to cost model the conversion. For now, the constants are tuned to more or less match our existing behavior, but these will be changed to reflect realistic values as this work proceeds. llvm-svn: 114973
* convert targets to the new MF.getMachineMemOperand interface.Chris Lattner2010-09-211-4/+6
| | | | llvm-svn: 114391
* Teach if-converter to be more careful with predicating instructions that wouldEvan Cheng2010-09-101-1/+1
| | | | | | | | | | | take multiple cycles to decode. For the current if-converter clients (actually only ARM), the instructions that are predicated on false are not nops. They would still take machine cycles to decode. Micro-coded instructions such as LDM / STM can potentially take multiple cycles to decode. If-converter should take treat them as non-micro-coded simple instructions. llvm-svn: 113570
* Many Thumb2 instructions can reference the full ARM register set (i.e.,Jim Grosbach2010-07-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | have 4 bits per register in the operand encoding), but have undefined behavior when the operand value is 13 or 15 (SP and PC, respectively). The trivial coalescer in linear scan sometimes will merge a copy from SP into a subsequent instruction which uses the copy, and if that instruction cannot legally reference SP, we get bad code such as: mls r0,r9,r0,sp instead of: mov r2, sp mls r0, r9, r0, r2 This patch adds a new register class for use by Thumb2 that excludes the problematic registers (SP and PC) and is used instead of GPR for those operands which cannot legally reference PC or SP. The trivial coalescer explicitly requires that the register class of the destination for the COPY instruction contain the source register for the COPY to be considered for coalescing. This prevents errant instructions like that above. PR7499 llvm-svn: 109842
* Replace copyRegToReg with copyPhysReg for ARM.Jakob Stoklund Olesen2010-07-111-27/+19
| | | | llvm-svn: 108078
* The t2MOVi16 and t2MOVTi16 instructions do not set CPSR. Trying to addBob Wilson2010-06-291-2/+2
| | | | | | | a CPSR operand to them causes an assertion failure, so apparently these instructions haven't been getting a lot of use. llvm-svn: 107147
* Change if-cvt options to something that actually as useable.Evan Cheng2010-06-291-4/+6
| | | | llvm-svn: 107121
* Change if-conversion block size limit checks to add some flexibility.Evan Cheng2010-06-251-0/+23
| | | | llvm-svn: 106901
* Tail merging pass shall not break up IT blocks. rdar://8115404Evan Cheng2010-06-221-0/+16
| | | | llvm-svn: 106517
* Allow ARM if-converter to be run after post allocation scheduling.Evan Cheng2010-06-181-1/+58
| | | | | | | | | | | | | | | | - This fixed a number of bugs in if-converter, tail merging, and post-allocation scheduler. If-converter now runs branch folding / tail merging first to maximize if-conversion opportunities. - Also changed the t2IT instruction slightly. It now defines the ITSTATE register which is read by instructions in the IT block. - Added Thumb2 specific hazard recognizer to ensure the scheduler doesn't change the instruction ordering in the IT block (since IT mask has been finalized). It also ensures no other instructions can be scheduled between instructions in the IT block. This is not yet enabled. llvm-svn: 106344
* Next round of tail call changes. Register used in a tailDale Johannesen2010-06-151-5/+7
| | | | | | | | call must not be callee-saved; following x86, add a new regclass to represent this. Also fixes a couple of bugs. Still disabled by default; Thumb doesn't work yet. llvm-svn: 106053
* Allow target to place 2-address pass inserted copies in better spots. Thumb2 ↵Evan Cheng2010-06-091-0/+43
| | | | | | will use this to try to avoid breaking up IT blocks. llvm-svn: 105745
* Clean up 80 column violations. No functional change.Jim Grosbach2010-06-021-1/+2
| | | | llvm-svn: 105350
* Add a DebugLoc argument to TargetInstrInfo::copyRegToReg, so that itDan Gohman2010-05-061-5/+3
| | | | | | doesn't have to guess. llvm-svn: 103194
* Add argument TargetRegisterInfo to loadRegFromStackSlot and storeRegToStackSlot.Evan Cheng2010-05-061-10/+12
| | | | llvm-svn: 103193
* Handle register-to-register copies within the tGPR class.Bob Wilson2010-04-261-12/+16
| | | | | | Radar 7896289 llvm-svn: 102396
* use DebugLoc default ctor instead of DebugLoc::getUnknownLoc()Chris Lattner2010-04-021-3/+3
| | | | llvm-svn: 100214
* Thumb2 storeFrom/LoadToStackSlot() need to handle tGPR regs directly, not passJim Grosbach2010-03-271-2/+2
| | | | | | | through to the generic version. The generic functions use STR/LDR, but T2 needs the t2STR/t2LDR instead so we get the addressing mode correct. llvm-svn: 99678
* Fix a crash compiling 254.gap for Thumb2. The Thumb2 add/sub with 12-bitBob Wilson2010-03-081-5/+23
| | | | | | | | immediate instructions cannot set the condition codes, so they do not have the extra cc_out operand. We hit an assertion during tail duplication because the instruction being duplicated had more operands that expected. llvm-svn: 98001
* Handle AddrMode6 (for NEON load/stores) in Thumb2's rewriteT2FrameIndex.Bob Wilson2010-02-061-11/+10
| | | | | | Radar 7614112. llvm-svn: 95456
* Remove predicates when changing an add into an unpredicable mov.Jakob Stoklund Olesen2010-01-191-3/+7
| | | | | | | Since the mov is executed unconditionally, make sure that the add didn't have any predicate. llvm-svn: 93909
* Remove the target hook TargetInstrInfo::BlockHasNoFallThrough in favor ofDan Gohman2009-12-051-24/+0
| | | | | | | MachineBasicBlock::canFallThrough(), which is target-independent and more thorough. llvm-svn: 90634
* Refactor code.Evan Cheng2009-11-081-53/+0
| | | | llvm-svn: 86423
* 80-column cleanup of file header commentsJim Grosbach2009-11-071-1/+1
| | | | llvm-svn: 86408
* t2ldrpci_pic can be used for blockaddress as well.Evan Cheng2009-11-071-3/+14
| | | | llvm-svn: 86400
* Refactor code. Fix a potential missing check. Teach isIdentical() about ↵Evan Cheng2009-11-071-26/+0
| | | | | | tLDRpci_pic. llvm-svn: 86330
* - Add TargetInstrInfo::isIdentical(). It's similar to MachineInstr::isIdenticalEvan Cheng2009-11-071-0/+26
| | | | | | | | | | except it doesn't care if the definitions' virtual registers differ. This is used by machine LICM and other MI passes to perform CSE. - Teach Thumb2InstrInfo::isIdentical() to check two t2LDRpci_pic are identical. Since pc relative constantpool entries are always different, this requires it it check if the values can actually the same. llvm-svn: 86328
* - Add pseudo instructions tLDRpci_pic and t2LDRpci_pic which does a pc-relativeEvan Cheng2009-11-061-1/+43
| | | | | | | | | | | | load of a GV from constantpool and then add pc. It allows the code sequence to be rematerializable so it would be hoisted by machine licm. - Add a late pass to break these pseudo instructions into a number of real instructions. Also move the code in Thumb2 IT pass that breaks up t2MOVi32imm to this pass. This is done before post regalloc scheduling to allow the scheduler to proper schedule these instructions. It also allow them to be if-converted and shrunk by later passes. llvm-svn: 86304
* Use NEON reg-reg moves, where profitable. This reduces "domain-cross" ↵Anton Korobeynikov2009-11-021-1/+2
| | | | | | stalls, when we used to mix vfp and neon code (the former were used for reg-reg moves) llvm-svn: 85764
* Fix a couple more places where we are creating ld / st instructions without ↵Evan Cheng2009-11-011-2/+18
| | | | | | memoperands. llvm-svn: 85746
* Add a Thumb BRIND pattern. Change the ARM BRIND assembly to separate theBob Wilson2009-10-281-0/+1
| | | | | | | opcode and operand with a tab. Check for these instructions in the usual places. llvm-svn: 85411
* Handle AddrMode4 for Thumb2 in rewriteT2FrameIndex. This occurs forBob Wilson2009-09-151-0/+5
| | | | | | | | | | | VLDM/VSTM instructions, and without this check, the code assumes that an offset is allowed, as it would be with VLDR/VSTR. The asm printer, however, silently drops the offset, producing incorrect code. Since the address register in this case is either the stack or frame pointer, the spill location ends up conflicting with some other stack slot or with outgoing arguments on the stack. llvm-svn: 81879
* Fix PR4789. Teach eliminateFrameIndex how to handle VLDRQ and VSTRQ which ↵Evan Cheng2009-08-271-14/+22
| | | | | | cannot fold any immediate offset. llvm-svn: 80191
* Whitespace cleanup. Remove trailing whitespace.Jim Grosbach2009-08-111-4/+4
| | | | llvm-svn: 78666
* Always use the 16-bit tMOVgpr2gpr instead of the 32-bit t2MOVr.Evan Cheng2009-08-101-2/+1
| | | | llvm-svn: 78549
* Use 16-bit tMOVgpr2gpr instead of tMOVr to copy GPR registers in Thumb2 mode.Evan Cheng2009-08-071-6/+1
| | | | llvm-svn: 78398
* It turns out most of the thumb2 instructions are not allowed to touch SP. ↵Evan Cheng2009-08-071-28/+70
| | | | | | | | | | The semantics of such instructions are unpredictable. We have just been lucky that tests have been passing. This patch takes pain to ensure all the PEI lowering code does the right thing when lowering frame indices, insert code to manipulate stack pointers, etc. It's also custom lowering dynamic stack alloc into pseudo instructions so we can insert the right instructions at scheduling time. This fixes PR4659 and PR4682. llvm-svn: 78361
* Use the i12 variant of load / store opcodes if offset is zero. Now we pass ↵Evan Cheng2009-08-031-1/+5
| | | | | | all of multisource as well. llvm-svn: 77939
* Move the getInlineAsmLength virtual method from TAI to TII, whereChris Lattner2009-08-021-2/+1
| | | | | | | | | | the only real caller (GetFunctionSizeInBytes) uses it. The custom ARM implementation of this is basically reimplementing an assembler poorly for negligible gain. It should be removed IMNSHO, but I'll leave that to ARMish folks to decide. llvm-svn: 77877
* Optimize Thumb2 jumptable to use tbb / tbh when all the offsets fit in byte ↵Evan Cheng2009-07-291-0/+2
| | | | | | / halfword. llvm-svn: 77422
* Thumb-2: fix typo that caused incorrect stack elimination for VFP operations ↵David Goodwin2009-07-281-1/+1
| | | | | | and very large stack frames. llvm-svn: 77401
* - More refactoring. This gets rid of all of the getOpcode calls.Evan Cheng2009-07-281-12/+317
| | | | | | | | | | | - This change also makes it possible to switch between ARM / Thumb on a per-function basis. - Fixed thumb2 routine which expand reg + arbitrary immediate. It was using using ARM so_imm logic. - Use movw and movt to do reg + imm when profitable. - Other code clean ups and minor optimizations. llvm-svn: 77300
* More DCE.Evan Cheng2009-07-271-4/+0
| | | | llvm-svn: 77231
OpenPOWER on IntegriCloud