summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Fix an ambiguously nested if.Owen Anderson2011-09-091-2/+2
| | | | llvm-svn: 139431
* Thumb unconditional branches are allowed in IT blocks, and therefore should ↵Owen Anderson2011-09-091-3/+10
| | | | | | have a predicate operand, unlike conditional branches. llvm-svn: 139415
* Put VMOVS widening under a command line option, off by default.Jakob Stoklund Olesen2011-08-311-1/+6
| | | | | | | | | | | It appears that our use of the imp-use and imp-def flags with sub-registers is not yet robust enough to support this. The failing test case is complicated, I am working on a reduction. <rdar://problem/10044201> llvm-svn: 138861
* Clean up Thumb load/store multiple definitions.Jim Grosbach2011-08-231-2/+0
| | | | | | | | There is no non-writeback store multiple instruction in Thumb1, so don't define one. As a result load multiple is the only instantiation of the multiclass, so refactor that away entirely. llvm-svn: 138338
* Remove the VMOVQQ pseudo instruction.Chad Rosier2011-08-201-8/+8
| | | | llvm-svn: 138177
* Add <imp-def> operands to QQ and QQQQ stack loads.Jakob Stoklund Olesen2011-08-201-2/+4
| | | | | | | | This pleases the register scavenger and brings test/CodeGen/ARM/2011-08-12-vmovqqqq-pseudo.ll a little closer to working with -verify-machineinstrs. llvm-svn: 138164
* VMOVQQQQs pseudo instructions are only created by ARMBaseInstrInfo::copyPhysReg.Chad Rosier2011-08-201-10/+31
| | | | | | | Therefore, rather then generate a pseudo instruction, which is later expanded, generate the necessary instructions in place. llvm-svn: 138163
* Rewrite some ARM InstrInfo functions to be most accepting of arbitrary ↵Owen Anderson2011-08-101-110/+115
| | | | | | register subclasses. Hopefully this fixes some buildbots. llvm-svn: 137223
* Promote VMOVS to VMOVD when possible.Jakob Stoklund Olesen2011-08-091-2/+29
| | | | | | | | | | | | | | | | | | | | | | | | | On Cortex-A8, we use the NEON v2f32 instructions for f32 arithmetic. For better latency, we also send D-register copies down the NEON pipeline by translating them to vorr instructions. This patch promotes even S-register copies to D-register copies when possible so they can also go down the NEON pipeline. Example: vldr.32 s0, LCPI0_0 loop: vorr d1, d0, d0 loop2: ... vadd.f32 d1, d1, d16 The vorr instruction looked like this after regalloc: %S2<def> = COPY %S0, %D1<imp-def> Copies involving odd S-registers, and copies that don't define the full D-register are left alone. llvm-svn: 137182
* Implement isLoadFromStackSlotPostFE and isStoreToStackSlotPostFE for ARM.Jakob Stoklund Olesen2011-08-081-0/+12
| | | | | | They improve the verbose assembly. llvm-svn: 137069
* Split up the ARM so_reg ComplexPattern into so_reg_reg and so_reg_imm, ↵Owen Anderson2011-07-211-1/+1
| | | | | | allowing us to distinguish the encodings that use shifted registers from those that use shifted immediates. This is necessary to allow the fixed-length decoder to distinguish things like BICS vs LDRH. llvm-svn: 135693
* Sink ARMMCExpr and ARMAddressingModes into MC layer. First step to separate ↵Evan Cheng2011-07-201-1/+2
| | | | | | ARM MC code from target. llvm-svn: 135636
* Remove VMOVDneon and VMOVQ, which are just aliases for VORR. This continues ↵Owen Anderson2011-07-151-1/+3
| | | | | | to simplify the path towards an auto-generated disassembler. llvm-svn: 135290
* Next round of MC refactoring. This patch factor MC table instantiations, MCEvan Cheng2011-07-141-1/+0
| | | | | | registeration and creation code into XXXMCDesc libraries. llvm-svn: 135184
* Add a target-indepedent entry to MCInstrDesc to describe the encoded size of ↵Owen Anderson2011-07-131-17/+3
| | | | | | an opcode. Switch ARM over to using that rather than its own special MCInstrDesc bits. llvm-svn: 135106
* Use BranchProbability instead of floating points in IfConverter.Jakub Staszak2011-07-101-15/+23
| | | | llvm-svn: 134858
* Hide the call to InitMCInstrInfo into tblgen generated ctor.Evan Cheng2011-07-011-2/+2
| | | | llvm-svn: 134244
* Refactor away tSpill and tRestore pseudos in ARM backend.Jim Grosbach2011-06-291-2/+2
| | | | | | | The tSpill and tRestore instructions are just copies of the tSTRspi and tLDRspi instructions, respectively. Just use those directly instead. llvm-svn: 134092
* Move CallFrameSetupOpcode and CallFrameDestroyOpcode to TargetInstrInfo.Evan Cheng2011-06-281-1/+2
| | | | llvm-svn: 134030
* Merge XXXGenRegisterNames.inc into XXXGenRegisterInfo.incEvan Cheng2011-06-281-1/+4
| | | | llvm-svn: 134024
* - Rename TargetInstrDesc, TargetOperandInfo to MCInstrDesc and MCOperandInfo andEvan Cheng2011-06-281-56/+56
| | | | | | | | sink them into MC layer. - Added MCInstrInfo, which captures the tablegen generated static data. Chang TargetInstrInfo so it's based off MCInstrInfo. llvm-svn: 134021
* use the MachineInstrBuilder operator-> to simplify some code.Chris Lattner2011-04-291-1/+1
| | | | | | There are probably more instances of this floating around. llvm-svn: 130474
* Change A9 scheduling itineraries VLD* / VST* entries default to "aligned". ThatEvan Cheng2011-04-191-0/+202
| | | | | | | | | is, it assumes addresses are 64-bit aligned (which should be the more common case). If the alignment is found not to be aligned, then getOperandLatency() would adjust the operand latency computation by one to compensate for it. rdar://9294833 llvm-svn: 129742
* Add ORR and EOR to the CMP peephole optimizer. It's hard to get isel to generateCameron Zwarich2011-04-151-1/+9
| | | | | | a case involving EOR, so I only added a test for ORR. llvm-svn: 129610
* The AND instruction leaves the V flag unmodified, so it falls victim to the sameCameron Zwarich2011-04-151-7/+6
| | | | | | problem as all of the other instructions we fold with CMPs. llvm-svn: 129602
* Add missing register forms of instructions to the ARM CMP-folding code. ThisCameron Zwarich2011-04-151-0/+12
| | | | | | fixes <rdar://problem/9287901>. llvm-svn: 129599
* Fix a ton of comment typos found by codespell. Patch byChris Lattner2011-04-151-1/+1
| | | | | | Luis Felipe Strano Moraes! llvm-svn: 129558
* Fix a typo.Cameron Zwarich2011-04-131-4/+4
| | | | llvm-svn: 129429
* Teach the ARM peephole optimizer that RSB, RSC, ADC, and SBC can be used for ↵Owen Anderson2011-04-061-1/+8
| | | | | | folded comparisons, just like ADD and SUB. llvm-svn: 129038
* Get rid of the non-writeback versions VLDMDB and VSTMDB, which don't ↵Owen Anderson2011-03-291-14/+0
| | | | | | actually exist. llvm-svn: 128461
* Nasty bug in ARMBaseInstrInfo::produceSameValue(). The MachineConstantPoolEntryEvan Cheng2011-03-241-5/+12
| | | | | | | | | | | | entries being compared may not be ARMConstantPoolValue. Without checking whether they are ARMConstantPoolValue first, and if the stars and moons are aligned properly, the equality test may return true (when the first few words of two Constants' values happen to be identical) and very bad things can happen. rdar://9125354 llvm-svn: 128203
* Cmp peephole optimization isn't always safe for signed arithmetics.Evan Cheng2011-03-231-3/+43
| | | | | | | | | | | | | | | | | | | | | int tries = INT_MAX; while (tries > 0) { tries--; } The check should be: subs r4, #1 cmp r4, #0 bgt LBB0_1 The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop canonicalization apparently does in this case). cmp #0 would have cleared it while not changing the N and Z bits. Since BGT is dependent on the V bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0. rdar://9172742 llvm-svn: 128179
* Preliminary support for ARM frame save directives emission via MI flags.Anton Korobeynikov2011-03-051-2/+3
| | | | | | | This is just very first approximation how the stuff should be done (e.g. ARM-only for now). More to follow. llvm-svn: 127101
* Last round of fixes for movw + movt global address codegen.Evan Cheng2011-01-211-8/+14
| | | | | | | | | | 1. Fixed ARM pc adjustment. 2. Fixed dynamic-no-pic codegen 3. CSE of pc-relative load of global addresses. It's now enabled by default for Darwin. llvm-svn: 123991
* Convert -enable-sched-cycles and -enable-sched-hazard to -disableAndrew Trick2011-01-211-9/+5
| | | | | | | | | | | flags. They are still not enable in this revision. Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with the scheduler's model of operand latency in the selection DAG. Generalized unit tests to work with sched-cycles. llvm-svn: 123969
* Don't be overly aggressive with CSE of "ldr constantpool". If it's a pc-relativeEvan Cheng2011-01-201-5/+1
| | | | | | | | | value, the "add pc" must be CSE'ed at the same time. We could follow the same approach as T2 by adding pseudo instructions that combine the ldr + "add pc". But the better approach is to use movw + movt (which I will enable soon), so I'll leave this as a TODO. llvm-svn: 123949
* Sorry, several patches in one.Evan Cheng2011-01-201-4/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | TargetInstrInfo: Change produceSameValue() to take MachineRegisterInfo as an optional argument. When in SSA form, targets can use it to make more aggressive equality analysis. Machine LICM: 1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead. 2. Fix a bug which prevent CSE of instructions which are not re-materializable. 3. Use improved form of produceSameValue. ARM: 1. Teach ARM produceSameValue to look pass some PIC labels. 2. Look for operands from different loads of different constant pool entries which have same values. 3. Re-implement PIC GA materialization using movw + movt. Combine the pair with a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible to re-materialize the instruction, allow machine LICM to hoist the set of instructions out of the loop and make it possible to CSE them. It's a bit hacky, but it significantly improve code quality. 4. Some minor bug fixes as well. With the fixes, using movw + movt to materialize GAs significantly outperform the load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap and 176.gcc ~10%. llvm-svn: 123905
* Materialize GA addresses with movw + movt pairs for Darwin in PIC mode. e.g.Evan Cheng2011-01-171-1/+7
| | | | | | | | | | | | movw r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4)) movt r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4)) LPC0_0: add r0, pc, r0 It's not yet enabled by default as some tests are failing. I suspect bugs in down stream tools. llvm-svn: 123619
* Simplify a bunch of isVirtualRegister() and isPhysicalRegister() logic.Jakob Stoklund Olesen2011-01-101-2/+1
| | | | | | | | These functions not longer assert when passed 0, but simply return false instead. No functional change intended. llvm-svn: 123155
* Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.Evan Cheng2011-01-081-1/+0
| | | | llvm-svn: 123048
* Various bits of framework needed for precise machine-level selectionAndrew Trick2010-12-241-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | DAG scheduling during isel. Most new functionality is currently guarded by -enable-sched-cycles and -enable-sched-hazard. Added InstrItineraryData::IssueWidth field, currently derived from ARM itineraries, but could be initialized differently on other targets. Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is active, and if so how many cycles of state it holds. Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry into the scheduler's available queue. ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to get information about it's SUnits, provides RecedeCycle for bottom-up scheduling, correctly computes scoreboard depth, tracks IssueCount, and considers potential stall cycles when checking for hazards. ScheduleDAGRRList now models machine cycles and hazards (under flags). It tracks MinAvailableCycle, drives the hazard recognizer and priority queue's ready filter, manages a new PendingQueue, properly accounts for stall cycles, etc. llvm-svn: 122541
* whitespaceAndrew Trick2010-12-241-1/+1
| | | | llvm-svn: 122539
* Remove the rest of the *_sfp Neon instruction patterns.Bob Wilson2010-12-131-2/+0
| | | | | | | | | | | | | Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions. This change made a big difference in the code generated for the CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing a fine job, but some instructions that were previously moved outside the loop are not moved now. It's using fewer VFP registers now, which is generally a good thing, so I think the estimates for register pressure changed and that affected the LICM behavior. Since that isn't obviously wrong, I've just changed the test file. This completes the work for Radar 8711675. llvm-svn: 121730
* Refactor the ARM CMPz* patterns to just use the normal CMP instructions whenJim Grosbach2010-12-071-2/+0
| | | | | | | possible. They were duplicates for everything exception the source pattern before. llvm-svn: 121179
* Making use of VFP / NEON floating point multiply-accumulate / subtraction isEvan Cheng2010-12-051-1/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960
* Rename t2 TBB and TBH instructions to reference that they encode the jump tableJim Grosbach2010-11-291-5/+5
| | | | | | data. Next up, pseudo-izing them. llvm-svn: 120320
* Move callee-saved regs spills / reloads to TFIAnton Korobeynikov2010-11-271-122/+0
| | | | llvm-svn: 120228
* Rewrite stack callee saved spills and restores to use push/pop instructions.Eric Christopher2010-11-181-19/+105
| | | | | | | | | Remove movePastCSLoadStoreOps and associated code for simple pointer increments. Update routines that depended upon other opcodes for save/restore. Adjust all testcases accordingly. llvm-svn: 119725
* Silence compiler warnings.Evan Cheng2010-11-181-2/+2
| | | | llvm-svn: 119610
* Remove ARM isel hacks that fold large immediates into a pair of add, sub, and,Evan Cheng2010-11-171-0/+97
| | | | | | | | | | | | | | | | | | | | | and xor. The 32-bit move immediates can be hoisted out of loops by machine LICM but the isel hacks were preventing them. Instead, let peephole optimization pass recognize registers that are defined by immediates and the ARM target hook will fold the immediates in. Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ instructions if there are multiple uses. This happens when the 'and' is live out, machine sink would have sinked the computation and that ends up pessimizing code. The peephole pass would recognize situations where the 'and' can be toggled to define CPSR and eliminate the comparison anyway. 2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking important optimizations. rdar://8663787, rdar://8241368 llvm-svn: 119548
OpenPOWER on IntegriCloud