summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Issue libcalls __udivmod*i4 / __divmod*i4 for div / rem pairs.Evan Cheng2011-04-011-0/+10
| | | | | | rdar://8911343 llvm-svn: 128696
* Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplierEvan Cheng2011-03-311-0/+38
| | | | | | | | | | | accumulator forwarding: vadd d3, d0, d1 vmul d3, d3, d2 => vmul d3, d0, d2 vmla d3, d1, d2 llvm-svn: 128665
* Don't try to create zero-sized stack objects.Evan Cheng2011-03-301-2/+3
| | | | llvm-svn: 128586
* Add a ARM-specific SD node for VBSL so that forms with a constant first operandCameron Zwarich2011-03-301-3/+32
| | | | | | can be recognized. This fixes <rdar://problem/9183078>. llvm-svn: 128584
* Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. FrontendsEvan Cheng2011-03-291-0/+7
| | | | | | | | | | | | | | | was lowering them to sext / uxt + mul instructions. Unfortunately the optimization passes may hoist the extensions out of the loop and separate them. When that happens, the long multiplication instructions can be broken into several scalar instructions, causing significant performance issue. Note the vmla and vmls intrinsics are not added back. Frontend will codegen them as intrinsics vmull* + add / sub. Also note the isel optimizations for catching mul + sext / zext are not changed either. First part of rdar://8832507, rdar://9203134 llvm-svn: 128502
* Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. FixesCameron Zwarich2011-03-291-0/+35
| | | | | | <rdar://problem/8875309> and <rdar://problem/9057191>. llvm-svn: 128492
* Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during ↵Evan Cheng2011-03-291-15/+81
| | | | | | | | | | | | | | | | | | | isel lowering to fold the zero-extend's and take advantage of no-stall back to back vmul + vmla: vmull q0, d4, d6 vmlal q0, d5, d6 is faster than vaddl q0, d4, d5 vmovl q1, d6 vmul q0, q0, q1 This allows us to vmull + vmlal for: f = vmull_u8( vget_high_u8(s), c); f = vmlal_u8(f, vget_low_u8(s), c); rdar://9197392 llvm-svn: 128444
* Fix the bfi handling for or (and a mask) (and b mask). We need the twoEric Christopher2011-03-261-9/+10
| | | | | | | | | | | | | | masks to match inversely for the code as is to work. For the example given we actually want: bfi r0, r2, #1, #1 not #0, however, given the way the pattern is written it's not possible at the moment. Fixes rdar://9177502 llvm-svn: 128320
* Re-apply r127953 with fixes: eliminate empty return block if it has no ↵Evan Cheng2011-03-211-0/+10
| | | | | | predecessors; update dominator tree if cfg is modified. llvm-svn: 127981
* Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessorsDaniel Dunbar2011-03-191-10/+0
| | | | | | to canonicalize IR", it broke a lot of things. llvm-svn: 127954
* SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IREvan Cheng2011-03-191-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to have single return block (at least getting there) for optimizations. This is general goodness but it would prevent some tailcall optimizations. One specific case is code like this: int f1(void); int f2(void); int f3(void); int f4(void); int f5(void); int f6(void); int foo(int x) { switch(x) { case 1: return f1(); case 2: return f2(); case 3: return f3(); case 4: return f4(); case 5: return f5(); case 6: return f6(); } } => LBB0_2: ## %sw.bb callq _f1 popq %rbp ret LBB0_3: ## %sw.bb1 callq _f2 popq %rbp ret LBB0_4: ## %sw.bb3 callq _f3 popq %rbp ret This patch teaches codegenprep to duplicate returns when the return value is a phi and where the phi operands are produced by tail calls followed by an unconditional branch: sw.bb7: ; preds = %entry %call8 = tail call i32 @f5() nounwind br label %return sw.bb9: ; preds = %entry %call10 = tail call i32 @f6() nounwind br label %return return: %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ] ret i32 %retval.0 This allows codegen to generate better code like this: LBB0_2: ## %sw.bb jmp _f1 ## TAILCALL LBB0_3: ## %sw.bb1 jmp _f2 ## TAILCALL LBB0_4: ## %sw.bb3 jmp _f3 ## TAILCALL rdar://9147433 llvm-svn: 127953
* The VTBL (and VTBX) instructions are rather permissive concerning the masks theyBill Wendling2011-03-151-0/+8
| | | | | | | accept. If a value in the mask is out of range, it uses the value 0, for VTBL, or leaves the value unchanged, for VTBX. llvm-svn: 127700
* Some minor cleanups based on feedback.Bill Wendling2011-03-151-6/+4
| | | | llvm-svn: 127694
* Generate a VTBL instruction instead of a series of loads and stores when weBill Wendling2011-03-141-1/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | can. As Nate pointed out, VTBL isn't super performant, but it *has* to be better than this: _shuf: @ BB#0: @ %entry push {r4, r7, lr} add r7, sp, #4 sub sp, #12 mov r4, sp bic r4, r4, #7 mov sp, r4 mov r2, sp vmov d16, r0, r1 orr r0, r2, #6 orr r3, r2, #7 vst1.8 {d16[0]}, [r3] vst1.8 {d16[5]}, [r0] subs r4, r7, #4 orr r0, r2, #5 vst1.8 {d16[4]}, [r0] orr r0, r2, #4 vst1.8 {d16[4]}, [r0] orr r0, r2, #3 vst1.8 {d16[0]}, [r0] orr r0, r2, #2 vst1.8 {d16[2]}, [r0] orr r0, r2, #1 vst1.8 {d16[1]}, [r0] vst1.8 {d16[3]}, [r2] vldr.64 d16, [sp] vmov r0, r1, d16 mov sp, r4 pop {r4, r7, pc} The "illegal" testcase in vext.ll is no longer illegal. <rdar://problem/9078775> llvm-svn: 127630
* Indentation.Evan Cheng2011-03-141-1/+1
| | | | llvm-svn: 127595
* Fix a compiler crash where a Glue value had multiple uses. Radar 9049552.Bob Wilson2011-03-081-1/+22
| | | | llvm-svn: 127198
* Fix comment typos.Bob Wilson2011-03-081-2/+2
| | | | llvm-svn: 127197
* Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo.Cameron Zwarich2011-03-071-21/+0
| | | | llvm-svn: 127175
* Remove unused conditional negate operations.Bob Wilson2011-03-051-1/+0
| | | | llvm-svn: 127090
* Fix a typo which cause dag combine crash. rdar://9059537.Evan Cheng2011-02-281-1/+1
| | | | llvm-svn: 126661
* Support for byval parameters on ARM. Will be enabled by a forthcomingStuart Hastings2011-02-281-9/+41
| | | | | | patch to the front-end. Radar 7662569. llvm-svn: 126655
* More fcopysign correctness and performance fix.Evan Cheng2011-02-231-33/+63
| | | | | | | | | | | | | The previous codegen for the slow path (when values are in VFP / NEON registers) was incorrect if the source is NaN. The new codegen uses NEON vbsl instruction to copy the sign bit. e.g. vmov.i32 d1, #0x80000000 vbsl d1, d2, d0 If NEON is not available, it uses integer instructions to copy the sign bit. rdar://9034702 llvm-svn: 126295
* Revert r124611 - "Keep track of incoming argument's location while emitting ↵Devang Patel2011-02-211-5/+5
| | | | | | | | | | | LiveIns." In other words, do not keep track of argument's location. The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body. This requires some coordination with debugger to get this working. - The debugger needs to be aware of prolog_end attribute attached with line table entries. - The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+) llvm-svn: 126155
* Implement sdiv & udiv for <4 x i16> and <8 x i8> NEON vector types.Nate Begeman2011-02-111-0/+182
| | | | | | This avoids moving each element to the integer register file and calling __divsi3 etc. on it. llvm-svn: 125402
* Fix buggy fcopysign lowering.Evan Cheng2011-02-111-5/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This define float @foo(float %x, float %y) nounwind readnone { entry: %0 = tail call float @copysignf(float %x, float %y) nounwind readnone ret float %0 } Was compiled to: vmov s0, r1 bic r0, r0, #-2147483648 vmov s1, r0 vcmpe.f32 s0, #0 vmrs apsr_nzcv, fpscr it lt vneglt.f32 s1, s1 vmov r0, s1 bx lr This fails to copy the sign of -0.0f because it's lost during the float to int conversion. Also, it's sub-optimal when the inputs are in GPR registers. Now it uses integer and + or operations when it's profitable. And it's correct! lsrs r1, r1, #31 bfi r0, r1, #31, #1 bx lr rdar://8984306 llvm-svn: 125357
* Fix an obvious typo which caused an isel assertion. rdar://8964854.Evan Cheng2011-02-071-1/+1
| | | | llvm-svn: 125023
* Add codegen support for using post-increment NEON load/store instructions.Bob Wilson2011-02-071-0/+176
| | | | | | | | The vld1-lane, vld1-dup and vst1-lane instructions do not yet support using post-increment versions, but all the rest of the NEON load/store instructions should be handled now. llvm-svn: 125014
* Given a pair of floating point load and store, if there are no other uses ofEvan Cheng2011-02-021-0/+5
| | | | | | | | | | | | | | | | | | | the load, then it may be legal to transform the load and store to integer load and store of the same width. This is done if the target specified the transformation as profitable. e.g. On arm, this can transform: vldr.32 s0, [] vstr.32 s0, [] to ldr r12, [] str r12, [] rdar://8944252 llvm-svn: 124708
* Keep track of incoming argument's location while emitting LiveIns.Devang Patel2011-01-311-5/+5
| | | | llvm-svn: 124611
* Provide correct registers for EH stuff on ARMAnton Korobeynikov2011-01-241-3/+4
| | | | llvm-svn: 124151
* Last round of fixes for movw + movt global address codegen.Evan Cheng2011-01-211-2/+5
| | | | | | | | | | 1. Fixed ARM pc adjustment. 2. Fixed dynamic-no-pic codegen 3. CSE of pc-relative load of global addresses. It's now enabled by default for Darwin. llvm-svn: 123991
* Sorry, several patches in one.Evan Cheng2011-01-201-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | TargetInstrInfo: Change produceSameValue() to take MachineRegisterInfo as an optional argument. When in SSA form, targets can use it to make more aggressive equality analysis. Machine LICM: 1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead. 2. Fix a bug which prevent CSE of instructions which are not re-materializable. 3. Use improved form of produceSameValue. ARM: 1. Teach ARM produceSameValue to look pass some PIC labels. 2. Look for operands from different loads of different constant pool entries which have same values. 3. Re-implement PIC GA materialization using movw + movt. Combine the pair with a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible to re-materialize the instruction, allow machine LICM to hoist the set of instructions out of the loop and make it possible to CSE them. It's a bit hacky, but it significantly improve code quality. 4. Some minor bug fixes as well. With the fixes, using movw + movt to materialize GAs significantly outperform the load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap and 176.gcc ~10%. llvm-svn: 123905
* For ARM subtargets with useNEONForSinglePrecisionFP, double count usesAndrew Trick2011-01-191-0/+16
| | | | | | | | of the floating point types less than 64-bits. It's somewhat of a temporary hack but forces more accurate modeling of register pressure and results in fewer spills. llvm-svn: 123811
* whitespaceAndrew Trick2011-01-191-16/+16
| | | | llvm-svn: 123810
* Don't forget to emit the load from indirect symbol when using movw + movt to ↵Evan Cheng2011-01-191-1/+8
| | | | | | materialize GA indirect symbols. llvm-svn: 123809
* Materialize GA addresses with movw + movt pairs for Darwin in PIC mode. e.g.Evan Cheng2011-01-171-27/+47
| | | | | | | | | | | | movw r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4)) movt r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4)) LPC0_0: add r0, pc, r0 It's not yet enabled by default as some tests are failing. I suspect bugs in down stream tools. llvm-svn: 123619
* Fix 80-cols.Eric Christopher2011-01-141-7/+14
| | | | llvm-svn: 123494
* Rename TargetFrameInfo into TargetFrameLowering. Also, put couple of FIXMEs ↵Anton Korobeynikov2011-01-101-2/+3
| | | | | | and fixes here and there. llvm-svn: 123170
* Simplify a bunch of isVirtualRegister() and isPhysicalRegister() logic.Jakob Stoklund Olesen2011-01-101-1/+1
| | | | | | | | These functions not longer assert when passed 0, but simply return false instead. No functional change intended. llvm-svn: 123155
* Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.Evan Cheng2011-01-081-0/+33
| | | | llvm-svn: 123048
* Add an explanatory message for an assertion.Bob Wilson2011-01-071-1/+2
| | | | llvm-svn: 123042
* Eliminate variable only used in debug builds.Matt Beaumont-Gay2011-01-071-3/+1
| | | | llvm-svn: 123040
* Lower some BUILD_VECTORS using VEXT+shuffle.Bob Wilson2011-01-071-2/+133
| | | | | | Patch by Tim Northover. llvm-svn: 123035
* Add ARM patterns to match EXTRACT_SUBVECTOR nodes.Bob Wilson2011-01-071-1/+1
| | | | | | | | | | | | | | | Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle vectors from being translated to EXTRACT_SUBVECTOR. Patch by Tim Northover. The test changes are needed to keep those spill-q tests from testing aligned spills and restores. If the only aligned stack objects are spill slots, we no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR was legalized by loading from the stack, which created an aligned frame index. Now, however, there is nothing except the spill slot in the stack frame, so I added an aligned alloca. llvm-svn: 122995
* Re-implement r122936 with proper target hooks. Now getMaxStoresPerMemcpyEvan Cheng2011-01-061-1/+2
| | | | | | | etc. takes an option OptSize. If OptSize is true, it would return the inline limit for functions with attribute OptSize. llvm-svn: 122952
* Radar 8803471: Fix expansion of ARM BCCi64 pseudo instructions.Bob Wilson2010-12-231-0/+3
| | | | | | | | If the basic block containing the BCCi64 (or BCCZi64) instruction ends with an unconditional branch, that branch needs to be deleted before appending the expansion of the BCCi64 to the end of the block. llvm-svn: 122521
* Add ARM-specific DAG combining to cast i64 vector element load/stores to f64.Bob Wilson2010-12-211-5/+103
| | | | | | | | | | | Type legalization splits up i64 values into pairs of i32 values, which leads to poor quality code when inserting or extracting i64 vector elements. If the vector element is loaded or stored, it can be treated as an f64 value and loaded or stored directly from a VPR register. Use the pre-legalization DAG combiner to cast those vector elements to f64 types so that the type legalizer won't mess them up. Radar 8755338. llvm-svn: 122319
* rename MVT::Flag to MVT::Glue. "Flag" is a terrible name forChris Lattner2010-12-211-9/+9
| | | | | | | something that just glues two nodes together, even if it is sometimes used for flags. llvm-svn: 122310
* Add some missing entries in ARMTargetLowering::getTargetNodeName.Bob Wilson2010-12-181-0/+5
| | | | llvm-svn: 122111
* Don't handle -arm-long-calls in fast isel for now.Eric Christopher2010-12-151-1/+1
| | | | llvm-svn: 121919
OpenPOWER on IntegriCloud