summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix up testcase for previous commit.Eric Christopher2011-04-051-1/+1
| | | | llvm-svn: 128870
* Fix register-dependent X86 tests.Jakob Stoklund Olesen2011-04-0530-76/+96
| | | | llvm-svn: 128867
* Allow coalescing with reserved physregs in certain cases:Jakob Stoklund Olesen2011-04-042-112/+0
| | | | | | | | | | | | | | | | | | | | | | | | | When a virtual register has a single value that is defined as a copy of a reserved register, permit that copy to be joined. These virtual register are usually copies of the stack pointer: %vreg75<def> = COPY %ESP; GR32:%vreg75 MOV32mr %vreg75, 1, %noreg, 0, %noreg, %vreg74<kill> MOV32mi %vreg75, 1, %noreg, 8, %noreg, 0 MOV32mi %vreg75<kill>, 1, %noreg, 4, %noreg, 0 CALLpcrel32 ... Coalescing these virtual registers early decreases register pressure. Previously, they were coalesced by RALinScan::attemptTrivialCoalescing after register allocation was completed. The lower register pressure causes the mcinst-lowering-cmp0.ll test case to fail because it depends on linear scan spilling a particular register. I am deleting 2008-08-05-SpillerBug.ll because it is counting the number of instructions emitted, and its revision history shows the 'correct' count being edited many times. llvm-svn: 128845
* Disable the PowerPC/Atomics-64 test.Jakob Stoklund Olesen2011-04-041-2/+8
| | | | | | | | The code inserted by PPCTargetLowering::EmitInstrWithCustomInserter for ppc64 is wrong, and I don't know how to fix it. It seems to be using the correct register classes for pointers, but it inserts all 32-bit instructions. llvm-svn: 128835
* Fix PowerPC tests to be register allocator independent.Jakob Stoklund Olesen2011-04-042-8/+8
| | | | llvm-svn: 128827
* ptx: support setp's 4-operand formatChe-Liang Chiou2011-04-021-0/+25
| | | | llvm-svn: 128767
* Do some peephole optimizations to remove pointless VMOVs from Neon to integerCameron Zwarich2011-04-021-0/+11
| | | | | | | | registers that arise from argument shuffling with the soft float ABI. These instructions are particularly slow on Cortex A8. This fixes one half of <rdar://problem/8674845>. llvm-svn: 128759
* LDRD/STRD instructions should print both Rt and Rt2 in the asm string.Jim Grosbach2011-04-012-2/+2
| | | | llvm-svn: 128736
* Add code for analyzing FP branches. Clean up branch Analysis functions.Akira Hatanaka2011-04-012-2/+48
| | | | llvm-svn: 128718
* Add test case.Evan Cheng2011-04-011-0/+27
| | | | llvm-svn: 128707
* FileCheck'ify test.Evan Cheng2011-04-011-8/+8
| | | | llvm-svn: 128706
* Fix Thumb and Thumb2 tests to be register allocator independent.Jakob Stoklund Olesen2011-03-315-23/+27
| | | | llvm-svn: 128690
* Provide a legal pointer register class when targeting thumb1.Jakob Stoklund Olesen2011-03-311-1/+1
| | | | | | The LocalStackSlotAllocation pass was creating illegal registers. llvm-svn: 128687
* Fix SystemZ testsJakob Stoklund Olesen2011-03-311-1/+2
| | | | llvm-svn: 128686
* Fix ARM tests to be register allocator independent.Jakob Stoklund Olesen2011-03-3117-56/+83
| | | | llvm-svn: 128680
* Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplierEvan Cheng2011-03-311-1/+27
| | | | | | | | | | | accumulator forwarding: vadd d3, d0, d1 vmul d3, d3, d2 => vmul d3, d0, d2 vmla d3, d1, d2 llvm-svn: 128665
* Fix Mips, Sparc, and XCore tests that were dependent on register allocation.Jakob Stoklund Olesen2011-03-316-49/+60
| | | | | | Add an extra run with -regalloc=basic to keep them honest. llvm-svn: 128654
* Added support for FP conditional move instructions and fixed bugs in ↵Akira Hatanaka2011-03-316-3/+349
| | | | | | handling of FP comparisons. llvm-svn: 128650
* Don't completely eliminate identity copies that also modify super register ↵Jakob Stoklund Olesen2011-03-311-0/+1
| | | | | | | | | liveness. Turn them into noop KILL instructions instead. This lets the scavenger know when super-registers are killed and defined. llvm-svn: 128645
* Mark all uses as <undef> when joining a copy.Jakob Stoklund Olesen2011-03-311-1/+1
| | | | | | | | | | | | This way, shrinkToUses() will ignore the instruction that is about to be deleted, and we avoid leaving invalid live ranges that SplitKit doesn't like. Fix a misunderstanding in MachineVerifier about <def,undef> operands. The <undef> flag is valid on def operands where it has the same meaning as <undef> on a use operand. It only applies to sub-register defines which also read the full register. llvm-svn: 128642
* Add XCore intrinsics for initializing / starting / synchronizing threads.Richard Osborne2011-03-311-0/+67
| | | | llvm-svn: 128633
* Pick a conservative register class when creating a small live range for remat.Jakob Stoklund Olesen2011-03-311-0/+61
| | | | | | | | | | | | The rematerialized instruction may require a more constrained register class than the register being spilled. In the test case, the spilled register has been inflated to the DPR register class, but we are rematerializing a load of the ssub_0 sub-register which only exists for DPR_VFP2 registers. The register class is reinflated after spilling, so the conservative choice is only temporary. llvm-svn: 128610
* Don't try to create zero-sized stack objects.Evan Cheng2011-03-301-0/+10
| | | | llvm-svn: 128586
* Add a ARM-specific SD node for VBSL so that forms with a constant first operandCameron Zwarich2011-03-301-0/+97
| | | | | | can be recognized. This fixes <rdar://problem/9183078>. llvm-svn: 128584
* Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. FrontendsEvan Cheng2011-03-291-0/+98
| | | | | | | | | | | | | | | was lowering them to sext / uxt + mul instructions. Unfortunately the optimization passes may hoist the extensions out of the loop and separate them. When that happens, the long multiplication instructions can be broken into several scalar instructions, causing significant performance issue. Note the vmla and vmls intrinsics are not added back. Frontend will codegen them as intrinsics vmull* + add / sub. Also note the isel optimizations for catching mul + sext / zext are not changed either. First part of rdar://8832507, rdar://9203134 llvm-svn: 128502
* Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. FixesCameron Zwarich2011-03-291-0/+19
| | | | | | <rdar://problem/8875309> and <rdar://problem/9057191>. llvm-svn: 128492
* Reduce test case.Rafael Espindola2011-03-291-53/+3
| | | | llvm-svn: 128445
* Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during ↵Evan Cheng2011-03-291-0/+29
| | | | | | | | | | | | | | | | | | | isel lowering to fold the zero-extend's and take advantage of no-stall back to back vmul + vmla: vmull q0, d4, d6 vmlal q0, d5, d6 is faster than vaddl q0, d4, d5 vmovl q1, d6 vmul q0, q0, q1 This allows us to vmull + vmlal for: f = vmull_u8( vget_high_u8(s), c); f = vmlal_u8(f, vget_low_u8(s), c); rdar://9197392 llvm-svn: 128444
* In some cases, the "fail BB dominator" may be null after the BB was split (andBill Wendling2011-03-281-1/+17
| | | | | | | becomes reachable when before it wasn't). Check to make sure that it's not null before trying to use it. llvm-svn: 128434
* Collect and coalesce DBG_VALUE instructions before emitting the function.Jakob Stoklund Olesen2011-03-261-1/+1
| | | | | | | | | | | Correctly terminate the range of register DBG_VALUEs when the register is clobbered or when the basic block ends. The code is now ready to deal with variables that are sometimes in a register and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack slot'. llvm-svn: 128327
* Fix the bfi handling for or (and a mask) (and b mask). We need the twoEric Christopher2011-03-261-0/+11
| | | | | | | | | | | | | | masks to match inversely for the code as is to work. For the example given we actually want: bfi r0, r2, #1, #1 not #0, however, given the way the pattern is written it's not possible at the moment. Fixes rdar://9177502 llvm-svn: 128320
* Emit less labels for debug info and stop emitting .loc directives for ↵Jakob Stoklund Olesen2011-03-253-16/+20
| | | | | | | | | | | | DBG_VALUEs. The .dot directives don't need labels, that is a leftover from when we created line number info manually. Instructions following a DBG_VALUE can share its label since the DBG_VALUE doesn't produce any code. llvm-svn: 128284
* Move test in x86 specific area.Devang Patel2011-03-241-0/+69
| | | | llvm-svn: 128245
* Keep track of directory namd and fIx regression caused by Rafael's patch ↵Devang Patel2011-03-241-1/+1
| | | | | | | | r119613. A better approach would be to move source id handling inside MC. llvm-svn: 128233
* Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets.NAKAMURA Takumi2011-03-242-1/+76
| | | | | FIXME: Some cleanups would be needed. llvm-svn: 128206
* Do early taildup of ret in CodeGenPrepare for potential tail calls that have aCameron Zwarich2011-03-241-0/+37
| | | | | | void return type. This fixes PR9487. llvm-svn: 128197
* Enable GlobalMerge on darwin.Devang Patel2011-03-232-2/+1
| | | | llvm-svn: 128183
* Revert r128175.Andrew Trick2011-03-232-30/+12
| | | | | | I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix. llvm-svn: 128181
* Cmp peephole optimization isn't always safe for signed arithmetics.Evan Cheng2011-03-232-2/+45
| | | | | | | | | | | | | | | | | | | | | int tries = INT_MAX; while (tries > 0) { tries--; } The check should be: subs r4, #1 cmp r4, #0 bgt LBB0_1 The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop canonicalization apparently does in this case). cmp #0 would have cleared it while not changing the N and Z bits. Since BGT is dependent on the V bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0. rdar://9172742 llvm-svn: 128179
* PR9535: add support for splitting and scalarizing vector ISD::FP_ROUND.Eli Friedman2011-03-231-0/+35
| | | | | | Also cleaning up some duplicated code while I'm here. llvm-svn: 128176
* Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS.Andrew Trick2011-03-232-12/+30
| | | | | | (target-specific branchless method for double-width relational comparisons on x86) llvm-svn: 128175
* Reapply r128045 and r128051 with fixes.Jakob Stoklund Olesen2011-03-221-4/+6
| | | | | | | | | | | | | | | | This will extend the ranges of debug info variables in registers until they are clobbered. Fix 1: Don't mistake DBG_VALUE instructions referring to incoming arguments on the stack with DBG_VALUE instructions referring to variables in the frame pointer. This fixes the gdb test-suite failure. Fix 2: Don't trace through copies to physical registers setting up call arguments. These registers are call clobbered, and the source register is more likely to be a callee-saved register that can be extended through the call instruction. llvm-svn: 128114
* Revert r128045 and r128051, debug info enhancements.Andrew Trick2011-03-221-6/+4
| | | | | | Temporarily reverting these to see if we can get llvm-objdump to link. Hopefully this is not the problem. llvm-svn: 128097
* ptx: add analyze/insert/remove branchChe-Liang Chiou2011-03-221-0/+3
| | | | llvm-svn: 128084
* Dont emit 'DBG_VALUE %noreg, ...' to terminate user variable ranges.Jakob Stoklund Olesen2011-03-221-4/+6
| | | | | | | | | | These ranges get completely jumbled by the post-ra scheduler, and it is not really reasonable to expect it to make sense of them. Instead, teach DwarfDebug to notice when user variables in registers are clobbered, and terminate the ranges there. llvm-svn: 128045
* Fix fast-isel address mode folding to avoid folding instructionsDan Gohman2011-03-221-0/+19
| | | | | | outside of the current basic block. This fixes PR9500, rdar://9156159. llvm-svn: 128041
* Write the section table and the section data in the same order thatRafael Espindola2011-03-201-2/+2
| | | | | | | gun as does. This makes it a lot easier to compare the output of both as the addresses are now a lot closer. llvm-svn: 127972
* Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessorsDaniel Dunbar2011-03-191-63/+0
| | | | | | to canonicalize IR", it broke a lot of things. llvm-svn: 127954
* SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IREvan Cheng2011-03-191-0/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to have single return block (at least getting there) for optimizations. This is general goodness but it would prevent some tailcall optimizations. One specific case is code like this: int f1(void); int f2(void); int f3(void); int f4(void); int f5(void); int f6(void); int foo(int x) { switch(x) { case 1: return f1(); case 2: return f2(); case 3: return f3(); case 4: return f4(); case 5: return f5(); case 6: return f6(); } } => LBB0_2: ## %sw.bb callq _f1 popq %rbp ret LBB0_3: ## %sw.bb1 callq _f2 popq %rbp ret LBB0_4: ## %sw.bb3 callq _f3 popq %rbp ret This patch teaches codegenprep to duplicate returns when the return value is a phi and where the phi operands are produced by tail calls followed by an unconditional branch: sw.bb7: ; preds = %entry %call8 = tail call i32 @f5() nounwind br label %return sw.bb9: ; preds = %entry %call10 = tail call i32 @f6() nounwind br label %return return: %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ] ret i32 %retval.0 This allows codegen to generate better code like this: LBB0_2: ## %sw.bb jmp _f1 ## TAILCALL LBB0_3: ## %sw.bb1 jmp _f2 ## TAILCALL LBB0_4: ## %sw.bb3 jmp _f3 ## TAILCALL rdar://9147433 llvm-svn: 127953
* Add support for legalizing UINT_TO_FP of vectors on platforms which doNadav Rotem2011-03-191-0/+11
| | | | | | | | not have native support for this operation (such as X86). The legalized code uses two vector INT_TO_FP operations and is faster than scalarizing. llvm-svn: 127951
OpenPOWER on IntegriCloud