summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
* Make ARMAsmPrinter generate the correct alignment specifier syntax in ↵Kristof Beyls2013-02-2216-96/+96
| | | | | | | | | instructions. The Printer will now print instructions with the correct alignment specifier syntax, like vld1.8 {d16}, [r0:64] llvm-svn: 175884
* DAGCombiner: Fold pointless truncate, bitcast, buildvector seriesArnold Schwaighofer2013-02-201-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | (2xi32) (truncate ((2xi64) bitcast (buildvector i32 a, i32 x, i32 b, i32 y))) can be folded into a (2xi32) (buildvector i32 a, i32 b). Such a DAG would cause uneccessary vdup instructions followed by vmovn instructions. We generate this code on ARM NEON for a setcc olt, 2xf64, 2xf64. For example, in the vectorized version of the code below. double A[N]; double B[N]; void test_double_compare_to_double() { int i; for(i=0;i<N;i++) A[i] = (double)(A[i] < B[i]); } radar://13191881 Fixes bug 15283. llvm-svn: 175670
* ARM NEON: Merge a f32 bitcast of a v2i32 extracteltArnold Schwaighofer2013-02-191-0/+25
| | | | | | | | | | | | | | A vectorized sitfp on doubles will get scalarized to a sequence of an extract_element of <2 x i32>, a bitcast to f32 and a sitofp. Due to the the extract_element, and the bitcast we will uneccessarily generate moves between scalar and vector registers. The patch fixes this by using a COPY_TO_REGCLASS and a EXTRACT_SUBREG to extract the element from the vector instead. radar://13191881 llvm-svn: 175520
* Comment out the rdar number.Chad Rosier2013-02-181-1/+1
| | | | llvm-svn: 175460
* [fast-isel] Remove an invalid assert.Chad Rosier2013-02-181-0/+7
| | | | | | | | If the memcpy has an odd length with an alignment of 2, this would incorrectly assert on the last 1 byte copy. rdar://13202135 llvm-svn: 175459
* Re-apply r175088 for bug fix 13622: Add paired register support forWeiming Zhao2013-02-142-2/+55
| | | | | | | | inline asm with 64-bit data on ARM Update test case to use -mtriple=arm-linux-gnueabi llvm-svn: 175186
* Make ARMAsmParser accept the correct alignment specifier syntax in instructions.Kristof Beyls2013-02-141-1/+1
| | | | | | | | | The parser will now accept instructions with alignment specifiers written like vld1.8 {d16}, [r0:64] , while also still accepting the incorrect syntax vld1.8 {d16}, [r0, :64] llvm-svn: 175164
* temporarily revert the patch due to some conflictsWeiming Zhao2013-02-132-55/+2
| | | | llvm-svn: 175107
* Bug fix 13622: Add paired register support for inline asm with 64-bit data ↵Weiming Zhao2013-02-132-2/+55
| | | | | | on ARM llvm-svn: 175088
* PR14992 - Tablegen incorrectly converts ARM tLDMIA_UPD pseudo to tLDMIADavid Peixotto2013-02-131-0/+28
| | | | | | | Fixed bug in tablegen conversion when source pseudo instruction has a different number of arguments than the destination instruction. llvm-svn: 175066
* ARM NEON: Handle v16i8 and v8i16 reverse shufflesArnold Schwaighofer2013-02-121-0/+27
| | | | | | | | | | | | | | | Lower reverse shuffles to a vrev64 and a vext instruction instead of the default legalization of storing and loading to the stack. This is important because we generate reverse shuffles in the loop vectorizer when we reverse store to an array. uint8_t Arr[N]; for (i = 0; i < N; ++i) Arr[N - i - 1] = ... radar://13171760 llvm-svn: 174929
* Revert 172027 and 174336. Remove diagnostics about over-aligned stack objects.Bob Wilson2013-02-082-20/+1
| | | | | | | | | | | | Aside from the question of whether we report a warning or an error when we can't satisfy a requested stack object alignment, the current implementation of this is not good. We're not providing any source location in the diagnostics and the current warning is not connected to any warning group so you can't control it. We could improve the source location somewhat, but we can do a much better job if this check is implemented in the front-end, so let's do that instead. <rdar://problem/13127907> llvm-svn: 174741
* Attempt to recover gdb bot after r174445.Manman Ren2013-02-062-2/+2
| | | | | | | | | Failure: undefined symbol 'Lline_table_start0'. Root-cause: we use a symbol subtraction to calculate at_stmt_list, but the line table entries are not dumped in the assembly. Fix: use zero instead of a symbol subtraction for Compile Unit 0. llvm-svn: 174479
* Dwarf: support for LTO where a single object file can have multiple line tablesManman Ren2013-02-052-2/+2
| | | | | | | | | We generate one line table for each compilation unit in the object file. Reviewed by Eric and Kevin. rdar://problem/13067005 llvm-svn: 174445
* [SjLj Prepare] When demoting an invoke instructions to the stack, if the normalChad Rosier2013-02-051-0/+67
| | | | | | | edge is critical, then split it so we can insert the store. rdar://13126179 llvm-svn: 174418
* Link .ARM.exidx with corresponding text section.Logan Chien2013-02-051-0/+47
| | | | | | | | The sh_link in the ELF section header of .ARM.exidx should be filled with the section index of the corresponding text section. llvm-svn: 174372
* [Stack Alignment] emit warning instead of a hard errorManman Ren2013-02-041-2/+2
| | | | | | | | | | | Per discussion in rdar://13127907, we should emit a hard error only if people write code where the requested alignment is larger than achievable and assumes the low bits are zeros. A warning should be good enough when we are not sure if the source code assumes the low bits are zeros. rdar://13127907 llvm-svn: 174336
* Add a special ARM trap encoding for NaCl.Eli Bendersky2013-01-301-0/+28
| | | | | | | | More details in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130128/163783.html Patch by JF Bastien llvm-svn: 173943
* Add missing header and test cases for r173939.Logan Chien2013-01-304-0/+211
| | | | llvm-svn: 173941
* Fix 64-bit atomic operations in Thumb mode.Tim Northover2013-01-291-0/+147
| | | | | | | | The ARM and Thumb variants of LDREXD and STREXD have different constraints and take different operands. Previously the code expanding atomic operations didn't take this into account and asserted in Thumb mode. llvm-svn: 173780
* Fixed the condition codes for the atomic64 min/umin code generation on ARM. ↵Silviu Baranga2013-01-251-2/+2
| | | | | | If the sutraction of the higher 32 bit parts gives a 0 result, we need to do the store operation. llvm-svn: 173437
* Remove some register allocation order dependencies.Jakob Stoklund Olesen2013-01-193-11/+11
| | | | llvm-svn: 172874
* Simplify writing floating types to assembly.Tim Northover2013-01-111-10/+0
| | | | | | | This removes previous special cases for each floating-point type in favour of a shared codepath. llvm-svn: 172189
* Stack Alignment: throw error if we can't satisfy the minimal alignmentManman Ren2013-01-102-1/+20
| | | | | | | | | | | | | | | | | | requirement when creating stack objects in MachineFrameInfo. Add CreateStackObjectWithMinAlign to throw error when the minimal alignment can't be achieved and to clamp the alignment when the preferred alignment can't be achieved. Same is true for CreateVariableSizedObject. Will not emit error in CreateSpillStackObject or CreateStackObject. As long as callers of CreateStackObject do not assume the object will be aligned at the requested alignment, we should not have miscompile since later optimizations which look at the object's alignment will have the correct information. rdar://12713765 llvm-svn: 172027
* MIsched: add an ILP window property to machine model.Andrew Trick2013-01-091-48/+0
| | | | | | | | | | This was an experimental option, but needs to be defined per-target. e.g. PPC A2 needs to aggressively hide latency. I converted some in-order scheduling tests to A2. Hal is working on more test cases. llvm-svn: 171946
* Specify complete triple for fp128 tests.Tim Northover2013-01-081-1/+1
| | | | | | | | This avoids FileCheck failing over different comment characters in assembly (notably powerpc64 on Linux vs Darwin) and should fix David's build-bot. llvm-svn: 171886
* Allow the asm printer to print fp128 values properly.Tim Northover2013-01-081-0/+10
| | | | llvm-svn: 171866
* Make the MergeGlobals pass correctly handle the address space qualifiers of ↵Silviu Baranga2013-01-071-0/+12
| | | | | | the global variables. We partition the set of globals by their address space, and apply the same the trasnformation as before to merge them. llvm-svn: 171730
* Revert "Adding support for llvm.arm.neon.vaddl[su].* and"Bob Wilson2012-12-202-128/+0
| | | | | | | This reverts r170694. The operations can be represented in IR without adding any new intrinsics. llvm-svn: 170765
* Adding support for llvm.arm.neon.vaddl[su].* andRenato Golin2012-12-202-0/+128
| | | | | | | | llvm.arm.neon.vsub[su].* intrinsics. Patch by Pete Couperus <pjcoup@gmail.com> llvm-svn: 170694
* LLVM sdisel normalize bit extraction of the form:Evan Cheng2012-12-191-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ((x & 0xff00) >> 8) << 2 to (x >> 6) & 0x3fc This is general goodness since it folds a left shift into the mask. However, the trailing zeros in the mask prevents the ARM backend from using the bit extraction instructions. And worse since the mask materialization may require an addition instruction. This comes up fairly frequently when the result of the bit twiddling is used as memory address. e.g. = ptr[(x & 0xFF0000) >> 16] We want to generate: ubfx r3, r1, #16, #8 ldr.w r3, [r0, r3, lsl #2] vs. mov.w r9, #1020 and.w r2, r9, r1, lsr #14 ldr r2, [r0, r2] Add a late ARM specific isel optimization to ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the 'base + offset' address computation; change the mask to one which doesn't have trailing zeros and enable the use of ubfx. Note the optimization has to be done late since it's target specific and we don't want to change the DAG normalization. It's also fairly restrictive as shifter operands are not always free. It's only done for lsh 1 / 2. It's known to be free on some cpus and they are most common for address computation. This is a slight win for blowfish, rijndael, etc. rdar://12870177 llvm-svn: 170581
* Disable ARM partial flag dependency optimization at -OzQuentin Colombet2012-12-181-0/+34
| | | | | | | | To not over constrain the scheduler for ARM in thumb mode, some optimizations for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency). Disables this check when code size is the priority, i.e., code is compiled with -Oz. llvm-svn: 170462
* MISched: add dependence to ExitSU to model live-out latency.Andrew Trick2012-12-181-0/+48
| | | | llvm-svn: 170454
* Some enhancements for memcpy / memset inline expansion.Evan Cheng2012-12-104-30/+133
| | | | | | | | | | | | | | | | | | | | | 1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do *not* replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791
* Added Mapping Symbols for ARM ELFTim Northover2012-12-071-2/+2
| | | | | | | | | | Before this patch, when you objdump an LLVM-compiled file, objdump tried to decode data-in-code sections as if they were code. This patch adds the missing Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D). Patch based on work by Greg Fitzgerald. llvm-svn: 169609
* Fix typos in CHECK lines.Dmitri Gribenko2012-12-061-1/+1
| | | | | | Patch by Alexander Zinenko. llvm-svn: 169547
* Properly fix the tes.Evan Cheng2012-12-061-2/+1
| | | | llvm-svn: 169464
* llvm/test/CodeGen/ARM/extload-knownzero.ll: Try to unbreak, to add -O0. I ↵NAKAMURA Takumi2012-12-061-1/+1
| | | | | | guess Chad expects fastisel here. llvm-svn: 169463
* [arm fast-isel] Make the fast-isel implementation of memcpy respect alignment.Chad Rosier2012-12-061-3/+94
| | | | | | rdar://12821569 llvm-svn: 169460
* Let targets provide hooks that compute known zero and ones for any_extendEvan Cheng2012-12-061-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and extload's. If they are implemented as zero-extend, or implicitly zero-extend, then this can enable more demanded bits optimizations. e.g. define void @foo(i16* %ptr, i32 %a) nounwind { entry: %tmp1 = icmp ult i32 %a, 100 br i1 %tmp1, label %bb1, label %bb2 bb1: %tmp2 = load i16* %ptr, align 2 br label %bb2 bb2: %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ] %cmp = icmp ult i16 %tmp3, 24 br i1 %cmp, label %bb3, label %exit bb3: call void @bar() nounwind br label %exit exit: ret void } This compiles to the followings before: push {lr} mov r2, #0 cmp r1, #99 bhi LBB0_2 @ BB#1: @ %bb1 ldrh r2, [r0] LBB0_2: @ %bb2 uxth r0, r2 cmp r0, #23 bhi LBB0_4 @ BB#3: @ %bb3 bl _bar LBB0_4: @ %exit pop {lr} bx lr The uxth is not needed since ldrh implicitly zero-extend the high bits. With this change it's eliminated. rdar://12771555 llvm-svn: 169459
* ARM custom lower ctpop for vector types. Patch by Pete Couperus.Evan Cheng2012-12-041-0/+191
| | | | llvm-svn: 169325
* Use the 'count' attribute to calculate the upper bound of an array.Bill Wendling2012-12-042-2/+2
| | | | | | | | | The count attribute is more accurate with regards to the size of an array. It also obviates the upper bound attribute in the subrange. We can also better handle an unbound array by setting the count to -1 instead of the lower bound to 1 and upper bound to 0. llvm-svn: 169312
* Add a 'count' field to the DWARF subrange.Bill Wendling2012-12-042-2/+2
| | | | | | | | | The count field is necessary because there isn't a difference between the 'lo' and 'hi' attributes for a one-element array and a zero-element array. When the count is '0', we know that this is a zero-element array. When it's >=1, then it's a normal constant sized array. When it's -1, then the array is unbounded. llvm-svn: 169218
* Stack Alignment: when creating stack objects in MachineFrameInfo, make sureManman Ren2012-12-041-0/+48
| | | | | | | | | | | | | | | | the alignment is clamped to TargetFrameLowering.getStackAlignment if the target does not support stack realignment or the option "realign-stack" is off. This will cause miscompile if the address is treated as aligned and add is replaced with or in DAGCombine. Added a bool StackRealignable to TargetFrameLowering to check whether stack realignment is implemented for the target. Also added a bool RealignOption to MachineFrameInfo to check whether the option "realign-stack" is on. rdar://12713765 llvm-svn: 169197
* Simplify REG_SEQUENCE lowering.Jakob Stoklund Olesen2012-12-011-3/+3
| | | | | | | | | | | The TwoAddressInstructionPass takes the machine code out of SSA form by expanding REG_SEQUENCE instructions into copies. It is no longer necessary to rewrite the registers used by a REG_SEQUENCE instruction because the new coalescer algorithm can do it now. REG_SEQUENCE is just converted to a sequence of sub-register copies now. llvm-svn: 169067
* Codegen failure for vmull with small vectorsSebastian Pop2012-11-301-0/+150
| | | | | | | | | | | | | | | | | | | | | | | | | Codegen was failing with an assertion because of unexpected vector operands when legalizing the selection DAG for a MUL instruction. The asserting code was legalizing multiplies for vectors of size 128 bits. It uses a custom lowering to try and detect cases where it can use a VMULL instruction instead of a VMOVL + VMUL. The code was looking for input operands to the MUL that had been sign or zero extended. If it found the extended operands it would drop the sign/zero extension and use the original vector size as input to a VMULL instruction. The code assumed that the original input vector was 64 bits so that after dropping the extension it would fit directly into a D register and could be used as an operand of a VMULL instruction. The input code that trigger the failure used a vector of <4 x i8> that was sign extended to <4 x i32>. It was not safe to drop the sign extension in this case because the original vector is only 32 bits wide. The fix is to insert a sign extension for the vector to reach the required 64 bit size. In this particular example, the vector would need to be sign extented to a <4 x i16>. llvm-svn: 169024
* Handle the situation where CodeGenPrepare removes a reference to a BB that hasBill Wendling2012-11-291-0/+23
| | | | | | | | | | | | | the last invoke instruction in the function. This also removes the last landing pad in an function. This is fine, but with SjLj EH code, we've already placed a bunch of code in the 'entry' block, which expects the landing pad to stick around. When we get to the situation where CGP has removed the last landing pad, go ahead and nuke the SjLj instructions from the 'entry' block. <rdar://problem/12721258> llvm-svn: 168930
* Added atomic 64 min/max/umin/umax instrinsics support in the ARM backend.Silviu Baranga2012-11-291-0/+61
| | | | llvm-svn: 168886
* Avoid rewriting instructions twice.Jakob Stoklund Olesen2012-11-291-0/+41
| | | | | | | | | This could cause miscompilations in targets where sub-register composition is not always idempotent (ARM). <rdar://problem/12758887> llvm-svn: 168837
* ARM: Implement CanLowerReturn so large vectors get expanded into sret.Benjamin Kramer2012-11-281-0/+12
| | | | | | Fixes 14337. llvm-svn: 168809
OpenPOWER on IntegriCloud