summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/Thumb2
Commit message (Collapse)AuthorAgeFilesLines
...
* Teach Thumb2 isel to fold and->rotr ==> ROR.Andrew Trick2011-04-291-2/+4
| | | | | | Generalization of Nate Begeman's patch! llvm-svn: 130502
* Combine thumb2-ror tests.Andrew Trick2011-04-292-13/+13
| | | | llvm-svn: 130498
* Be careful about scheduling nodes above previous calls. It increase usages ofEvan Cheng2011-04-261-1/+1
| | | | | | | | | | | | more callee-saved registers and introduce copies. Only allows it if scheduling a node above calls would end up lessen register pressure. Call operands also has added ABI restrictions for register allocation, so be extra careful with hoisting them above calls. rdar://9329627 llvm-svn: 130245
* Make tests more useful.Benjamin Kramer2011-04-252-4/+4
| | | | | | lit needs a linter ... llvm-svn: 130126
* Accidental function name mangling.Andrew Trick2011-04-231-1/+1
| | | | llvm-svn: 130050
* Thumb2 and ARM add/subtract with carry fixes.Andrew Trick2011-04-233-8/+24
| | | | | | | | | | | | | Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>. t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the assembly printer correctly prints the 's' suffix. Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags. Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS. Fixes ARM SBC lowering to check for live carry (potential bug). llvm-svn: 130048
* whitespaceAndrew Trick2011-04-231-2/+2
| | | | llvm-svn: 130046
* In Thumb2 mode, lower frame indix references to:Evan Cheng2011-04-221-0/+23
| | | | | | | | | | | add <rd>, sp, #<imm8> ldr <rd>, [sp, #<imm8>] When the offset from sp is multiple of 4 and in range of 0-1020. This saves code size by utilizing 16-bit instructions. rdar://9321541 llvm-svn: 129971
* Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor ↵Andrew Trick2011-04-131-2/+2
| | | | | | | | | | | | | | | | | | | | | latency. Additional fixes: Do something reasonable for subtargets with generic itineraries by handle node latency the same as for an empty itinerary. Now nodes default to unit latency unless an itinerary explicitly specifies a zero cycle stage or it is a TokenFactor chain. Original fixes: UnitsSharePred was a source of randomness in the scheduler: node priority depended on the queue data structure. I rewrote the recent VRegCycle heuristics to completely replace the old heuristic without any randomness. To make the ndoe latency adjustments work, I also needed to do something a little more reasonable with TokenFactor. I gave it zero latency to its consumers and always schedule it as low as possible. llvm-svn: 129421
* fix two completely broken tests, which were matching due to PR9629.Chris Lattner2011-04-092-4/+4
| | | | llvm-svn: 129195
* Fix Thumb and Thumb2 tests to be register allocator independent.Jakob Stoklund Olesen2011-03-314-11/+12
| | | | llvm-svn: 128690
* Fix the bfi handling for or (and a mask) (and b mask). We need the twoEric Christopher2011-03-261-0/+11
| | | | | | | | | | | | | | masks to match inversely for the code as is to work. For the example given we actually want: bfi r0, r2, #1, #1 not #0, however, given the way the pattern is written it's not possible at the moment. Fixes rdar://9177502 llvm-svn: 128320
* Roll r127459 back in:Cameron Zwarich2011-03-114-5/+3
| | | | | | | | | | | Optimize trivial branches in CodeGenPrepare, which often get created from the lowering of objectsize intrinsics. Unfortunately, a number of tests were relying on llc not optimizing trivial branches, so I had to add an option to allow them to continue to test what they originally tested. This fixes <rdar://problem/8785296> and <rdar://problem/9112893>. llvm-svn: 127498
* Revert r127459, "Optimize trivial branches in CodeGenPrepare, which often getDaniel Dunbar2011-03-114-3/+5
| | | | | | created from the", it broke some GCC test suite tests. llvm-svn: 127477
* Optimize trivial branches in CodeGenPrepare, which often get created from theCameron Zwarich2011-03-114-5/+3
| | | | | | | | | | lowering of objectsize intrinsics. Unfortunately, a number of tests were relying on llc not optimizing trivial branches, so I had to add an option to allow them to continue to test what they originally tested. This fixes <rdar://problem/8785296> and <rdar://problem/9112893>. llvm-svn: 127459
* Move a test that ended up in the wrong place.Bob Wilson2011-02-051-0/+18
| | | | llvm-svn: 124933
* Last round of fixes for movw + movt global address codegen.Evan Cheng2011-01-213-33/+6
| | | | | | | | | | 1. Fixed ARM pc adjustment. 2. Fixed dynamic-no-pic codegen 3. CSE of pc-relative load of global addresses. It's now enabled by default for Darwin. llvm-svn: 123991
* Enable support for precise scheduling of the instruction selectionAndrew Trick2011-01-211-4/+9
| | | | | | | | | | | | | | | | | | | | | | | DAG. Disable using "-disable-sched-cycles". For ARM, this enables a framework for modeling the cpu pipeline and counting stalls. It also activates several heuristics to drive scheduling based on the model. Scheduling is inherently imprecise at this stage, and until spilling is improved it may defeat attempts to schedule. However, this framework provides greater control over tuning codegen. Although the flag is not target-specific, it should have very little affect on the default scheduler used by x86. The only two changes that affect x86 are: - scheduling a high-latency operation bumps the current cycle so independent operations can have their latency covered. i.e. two independent 4 cycle operations can produce results in 4 cycles, not 8 cycles. - Two operations with equal register pressure impact and no latency-based stalls on their uses will be prioritized by depth before height (height is irrelevant if no stalls occur in the schedule below this point). llvm-svn: 123971
* Add ARM patterns to match EXTRACT_SUBVECTOR nodes.Bob Wilson2011-01-071-1/+4
| | | | | | | | | | | | | | | Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle vectors from being translated to EXTRACT_SUBVECTOR. Patch by Tim Northover. The test changes are needed to keep those spill-q tests from testing aligned spills and restores. If the only aligned stack objects are spill slots, we no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR was legalized by loading from the stack, which created an aligned frame index. Now, however, there is nothing except the spill slot in the stack frame, so I added an aligned alloca. llvm-svn: 122995
* Remove the rest of the *_sfp Neon instruction patterns.Bob Wilson2010-12-131-5/+0
| | | | | | | | | | | | | Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions. This change made a big difference in the code generated for the CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing a fine job, but some instructions that were previously moved outside the loop are not moved now. It's using fewer VFP registers now, which is generally a good thing, so I think the estimates for register pressure changed and that affected the LICM behavior. Since that isn't obviously wrong, I've just changed the test file. This completes the work for Radar 8711675. llvm-svn: 121730
* (or (and (shl A, #shamt), mask), B) => ARMbfi B, A, ~mask where lsb(mask) == ↵Evan Cheng2010-12-111-0/+11
| | | | | | #shamt. rdar://8752056 llvm-svn: 121606
* ARM stm/ldm instructions require more than one register in the register list.Jim Grosbach2010-12-091-1/+1
| | | | | | | | Otherwise, a plain str/ldr should be used instead. Make sure we account for that in prologue/epilogue code generation. rdar://8745460 llvm-svn: 121391
* The Thumb tADDrSPi instruction is not valid when the destination is SP.Bob Wilson2010-12-041-0/+11
| | | | | | Check for that and try narrowing it to tADDspi instead. Radar 8724703. llvm-svn: 120892
* When using the 'push' mnemonic for Thumb2 stmdb, be explicit when it's theJim Grosbach2010-12-031-1/+1
| | | | | | 32-bit wide version by adding the .w suffix. llvm-svn: 120838
* Add correct encodings for STRD and LDRD, including fixup support. ↵Owen Anderson2010-12-011-1/+1
| | | | | | Additionally, update these to unified syntax. llvm-svn: 120589
* Fix epilogue codegen to avoid leaving the stack pointer in an invalidEvan Cheng2010-11-223-5/+50
| | | | | | | | | | | | | | | | | state. Previously Thumb2 would restore sp from fp like this: mov sp, r7 sub, sp, #4 If an interrupt is taken after the 'mov' but before the 'sub', callee-saved registers might be clobbered by the interrupt handler. Instead, try restoring directly from sp: add sp, #4 Or, if necessary (with VLA, etc.) use a scratch register to compute sp and then restore it: sub.w r4, r7, #8 mov sp, r7 rdar://8465407 llvm-svn: 119977
* Rewrite stack callee saved spills and restores to use push/pop instructions.Eric Christopher2010-11-181-1/+1
| | | | | | | | | Remove movePastCSLoadStoreOps and associated code for simple pointer increments. Update routines that depended upon other opcodes for save/restore. Adjust all testcases accordingly. llvm-svn: 119725
* These tests are looking for library function names thatDale Johannesen2010-11-171-1/+1
| | | | | | | appear to differ on Linux. Try to make them pass on Linux. Would be good for a Linux person to review this. llvm-svn: 119572
* Remove ARM isel hacks that fold large immediates into a pair of add, sub, and,Evan Cheng2010-11-173-6/+45
| | | | | | | | | | | | | | | | | | | | | and xor. The 32-bit move immediates can be hoisted out of loops by machine LICM but the isel hacks were preventing them. Instead, let peephole optimization pass recognize registers that are defined by immediates and the ARM target hook will fold the immediates in. Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ instructions if there are multiple uses. This happens when the 'and' is live out, machine sink would have sinked the computation and that ends up pessimizing code. The peephole pass would recognize situations where the 'and' can be toggled to define CPSR and eliminate the comparison anyway. 2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking important optimizations. rdar://8663787, rdar://8241368 llvm-svn: 119548
* Two sets of changes. Sorry they are intermingled.Evan Cheng2010-11-031-2/+0
| | | | | | | | | | | | | 1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to "optimize for latency". Call instructions don't have the right latency and this is more likely to use introduce spills. 2. Fix if-converter cost function. For ARM, it should use instruction latencies, not # of micro-ops since multi-latency instructions is completely executed even when the predicate is false. Also, some instruction will be "slower" when they are predicated due to the register def becoming implicit input. rdar://8598427 llvm-svn: 118135
* Revert r114340 (improvements in Darwin function prologue/epilogue), as it brokeJim Grosbach2010-11-021-6/+7
| | | | | | assumptions about stack layout. Specifically, LR must be saved next to FP. llvm-svn: 118026
* Overhaul memory barriers in the ARM backend. Radar 8601999.Bob Wilson2010-10-301-9/+23
| | | | | | | | | | | | | | | | | | | There were a number of issues to fix up here: * The "device" argument of the llvm.memory.barrier intrinsic should be used to distinguish the "Full System" domain from the "Inner Shareable" domain. It has nothing to do with using DMB vs. DSB instructions. * The compiler should never need to emit DSB instructions. Remove the ARMISD::SYNCBARRIER node and also remove the instruction patterns for DSB. * Merge the separate DMB/DSB instructions for options only used for the disassembler with the default DMB/DSB instructions. Add the default "full system" option ARM_MB::SY to the ARM_MB::MemBOpt enum. * Add a separate ARMISD::MEMBARRIER_MCR node for subtargets that implement a data memory barrier using the MCR instruction. * Fix up encodings for these instructions (except MCR). I also updated the tests and added a few new ones to check for DMB options that were not currently being exercised. llvm-svn: 117756
* Avoiding overly aggressive latency scheduling. If the two nodes share anEvan Cheng2010-10-292-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | operand and one of them has a single use that is a live out copy, favor the one that is live out. Otherwise it will be difficult to eliminate the copy if the instruction is a loop induction variable update. e.g. BB: sub r1, r3, #1 str r0, [r2, r3] mov r3, r1 cmp bne BB => BB: str r0, [r2, r3] sub r3, r3, #1 cmp bne BB This fixed the recent 256.bzip2 regression. llvm-svn: 117675
* More accurate estimate / tracking of register pressure.Evan Cheng2010-10-201-1/+14
| | | | | | | | | | - Initial register pressure in the loop should be all the live defs into the loop. Not just those from loop preheader which is often empty. - When an instruction is hoisted, update register pressure from loop preheader to the original BB. - Treat only use of a virtual register as kill since the code is still SSA. llvm-svn: 116956
* Fix crash introduced in 116852. 8573915.Dale Johannesen2010-10-201-0/+17
| | | | llvm-svn: 116955
* Enable using vdup for vector constants which are splat ofDale Johannesen2010-10-192-41/+4
| | | | | | | integers by default, and remove the controlling flag, now that LICM will hoist such vdup's. 8003375. llvm-svn: 116852
* Re-enable register pressure aware machine licm with fixes. Hoist() may haveEvan Cheng2010-10-191-2/+1
| | | | | | | erased the instruction during LICM so UpdateRegPressureAfter() should not reference it afterwards. llvm-svn: 116845
* Revert r116781 "- Add a hook for target to determine whether an instruction defDaniel Dunbar2010-10-191-1/+2
| | | | | | is", which breaks some nightly tests. llvm-svn: 116816
* - Add a hook for target to determine whether an instruction def isEvan Cheng2010-10-191-2/+1
| | | | | | | | | | | "long latency" enough to hoist even if it may increase spilling. Reloading a value from spill slot is often cheaper than performing an expensive computation in the loop. For X86, that means machine LICM will hoist SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON instructions. - Enable register pressure aware machine LICM by default. llvm-svn: 116781
* Change register allocation order for ARM VFP and NEON registers to put theBob Wilson2010-10-084-5/+34
| | | | | | | | | | | | | | | | callee-saved registers at the end of the lists. Also prefer to avoid using the low registers that are in register subclasses required by certain instructions, so that those registers will more likely be available when needed. This change makes a huge improvement in spilling in some cases. Thanks to Jakob for helping me realize the problem. Most of this patch is fixing the testsuite. There are quite a few places where we're checking for specific registers. I changed those to wildcards in places where that doesn't weaken the tests. The spill-q.ll and thumb2-spill-q.ll tests stopped spilling with this change, so I added a bunch of live values to force spills on those tests. llvm-svn: 116055
* Enable target-specific mul-lowering on ARM, even at -Os. Remove a test that ↵Owen Anderson2010-09-211-15/+0
| | | | | | | | this makes irrelevant, but add a new test for the new, improved functionality. llvm-svn: 114494
* Simplify ARM callee-saved register handling by removing the distinctionJim Grosbach2010-09-201-7/+6
| | | | | | | | | | | | | | | | | | | between the high and low registers for prologue/epilogue code. This was a Darwin-only thing that wasn't providing a realistic benefit anymore. Combining the save areas simplifies the compiler code and results in better ARM/Thumb2 codegen. For example, previously we would generate code like: push {r4, r5, r6, r7, lr} add r7, sp, #12 stmdb sp!, {r8, r10, r11} With this change, we combine the register saves and generate: push {r4, r5, r6, r7, r8, r10, r11, lr} add r7, sp, #12 rdar://8445635 llvm-svn: 114340
* Teach the (non-MC) instruction printer to use the cannonical names for push/pop,Jim Grosbach2010-09-171-1/+1
| | | | | | and shift instructions on ARM. Update the tests to match. llvm-svn: 114230
* Move thumb2 tests to the thumb2 directoryJim Grosbach2010-09-172-0/+132
| | | | llvm-svn: 114206
* Teach if-converter to be more careful with predicating instructions that wouldEvan Cheng2010-09-101-1/+1
| | | | | | | | | | | take multiple cycles to decode. For the current if-converter clients (actually only ARM), the instructions that are predicated on false are not nops. They would still take machine cycles to decode. Micro-coded instructions such as LDM / STM can potentially take multiple cycles to decode. If-converter should take treat them as non-micro-coded simple instructions. llvm-svn: 113570
* Fix NEON VLD pseudo instruction itineraries that were incorrectly copied fromBob Wilson2010-09-091-1/+1
| | | | | | | the VST pseudos. The VLD/VST scheduling still needs work (see pr6722), but at least we shouldn't confuse the loads with the stores. llvm-svn: 113473
* Re-apply r112883:Jim Grosbach2010-09-031-9/+2
| | | | | | | | | | | | | | "For ARM stack frames that utilize variable sized objects and have either large local stack areas or require dynamic stack realignment, allocate a base register via which to access the local frame. This allows efficient access to frame indices not accessible via the FP (either due to being out of range or due to dynamic realignment) or the SP (due to variable sized object allocation). In particular, this greatly improves efficiency of access to spill slots in Thumb functions which contain VLAs." r112986 fixed a latent bug exposed by the above. llvm-svn: 112989
* Revert "For ARM stack frames that utilize variable sized objects and have ↵Daniel Dunbar2010-09-031-2/+9
| | | | | | | | either", it is breaking oggenc with Clang for ARMv6. This reverts commit 8d6e29cfda270be483abf638850311670829ee65. llvm-svn: 112962
* For ARM stack frames that utilize variable sized objects and have eitherJim Grosbach2010-09-021-9/+2
| | | | | | | | | | | | | | | large local stack areas or require dynamic stack realignment, allocate a base register via which to access the local frame. This allows efficient access to frame indices not accessible via the FP (either due to being out of range or due to dynamic realignment) or the SP (due to variable sized object allocation). In particular, this greatly improves efficiency of access to spill slots in Thumb functions which contain VLAs. rdar://7352504 rdar://8374540 rdar://8355680 llvm-svn: 112883
* Now that register allocation properly considers reserved regs, simplify theJim Grosbach2010-09-023-8/+8
| | | | | | ARM register class allocation order functions to take advantage of that. llvm-svn: 112841
OpenPOWER on IntegriCloud