| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
If the sutraction of the higher 32 bit parts gives a 0 result, we need to do the store operation.
llvm-svn: 173437
|
| |
|
|
| |
llvm-svn: 172874
|
| |
|
|
|
|
|
| |
This removes previous special cases for each floating-point type in favour of a
shared codepath.
llvm-svn: 172189
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
requirement when creating stack objects in MachineFrameInfo.
Add CreateStackObjectWithMinAlign to throw error when the minimal alignment
can't be achieved and to clamp the alignment when the preferred alignment
can't be achieved. Same is true for CreateVariableSizedObject.
Will not emit error in CreateSpillStackObject or CreateStackObject.
As long as callers of CreateStackObject do not assume the object will be
aligned at the requested alignment, we should not have miscompile since
later optimizations which look at the object's alignment will have the correct
information.
rdar://12713765
llvm-svn: 172027
|
| |
|
|
|
|
|
|
|
|
| |
This was an experimental option, but needs to be defined
per-target. e.g. PPC A2 needs to aggressively hide latency.
I converted some in-order scheduling tests to A2. Hal is working on
more test cases.
llvm-svn: 171946
|
| |
|
|
|
|
|
|
| |
This avoids FileCheck failing over different comment characters in
assembly (notably powerpc64 on Linux vs Darwin) and should fix David's
build-bot.
llvm-svn: 171886
|
| |
|
|
| |
llvm-svn: 171866
|
| |
|
|
|
|
| |
the global variables. We partition the set of globals by their address space, and apply the same the trasnformation as before to merge them.
llvm-svn: 171730
|
| |
|
|
|
|
|
| |
This reverts r170694. The operations can be represented in IR without
adding any new intrinsics.
llvm-svn: 170765
|
| |
|
|
|
|
|
|
| |
llvm.arm.neon.vsub[su].* intrinsics.
Patch by Pete Couperus <pjcoup@gmail.com>
llvm-svn: 170694
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
((x & 0xff00) >> 8) << 2
to
(x >> 6) & 0x3fc
This is general goodness since it folds a left shift into the mask. However,
the trailing zeros in the mask prevents the ARM backend from using the bit
extraction instructions. And worse since the mask materialization may require
an addition instruction. This comes up fairly frequently when the result of
the bit twiddling is used as memory address. e.g.
= ptr[(x & 0xFF0000) >> 16]
We want to generate:
ubfx r3, r1, #16, #8
ldr.w r3, [r0, r3, lsl #2]
vs.
mov.w r9, #1020
and.w r2, r9, r1, lsr #14
ldr r2, [r0, r2]
Add a late ARM specific isel optimization to
ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the
'base + offset' address computation; change the mask to one which doesn't have
trailing zeros and enable the use of ubfx.
Note the optimization has to be done late since it's target specific and we
don't want to change the DAG normalization. It's also fairly restrictive
as shifter operands are not always free. It's only done for lsh 1 / 2. It's
known to be free on some cpus and they are most common for address
computation.
This is a slight win for blowfish, rijndael, etc.
rdar://12870177
llvm-svn: 170581
|
| |
|
|
|
|
|
|
| |
To not over constrain the scheduler for ARM in thumb mode, some optimizations for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency).
Disables this check when code size is the priority, i.e., code is compiled with -Oz.
llvm-svn: 170462
|
| |
|
|
| |
llvm-svn: 170454
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
if it's not possible to materialize an integer immediate with a single
instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
Also increase the threshold to something reasonable (8 for memset, 4 pairs
for memcpy).
This significantly improves Dhrystone, up to 50% on ARM iOS devices.
rdar://12760078
llvm-svn: 169791
|
| |
|
|
|
|
|
|
|
|
| |
Before this patch, when you objdump an LLVM-compiled file, objdump tried to
decode data-in-code sections as if they were code. This patch adds the missing
Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D).
Patch based on work by Greg Fitzgerald.
llvm-svn: 169609
|
| |
|
|
|
|
| |
Patch by Alexander Zinenko.
llvm-svn: 169547
|
| |
|
|
| |
llvm-svn: 169464
|
| |
|
|
|
|
| |
guess Chad expects fastisel here.
llvm-svn: 169463
|
| |
|
|
|
|
| |
rdar://12821569
llvm-svn: 169460
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and extload's. If they are implemented as zero-extend, or implicitly
zero-extend, then this can enable more demanded bits optimizations. e.g.
define void @foo(i16* %ptr, i32 %a) nounwind {
entry:
%tmp1 = icmp ult i32 %a, 100
br i1 %tmp1, label %bb1, label %bb2
bb1:
%tmp2 = load i16* %ptr, align 2
br label %bb2
bb2:
%tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ]
%cmp = icmp ult i16 %tmp3, 24
br i1 %cmp, label %bb3, label %exit
bb3:
call void @bar() nounwind
br label %exit
exit:
ret void
}
This compiles to the followings before:
push {lr}
mov r2, #0
cmp r1, #99
bhi LBB0_2
@ BB#1: @ %bb1
ldrh r2, [r0]
LBB0_2: @ %bb2
uxth r0, r2
cmp r0, #23
bhi LBB0_4
@ BB#3: @ %bb3
bl _bar
LBB0_4: @ %exit
pop {lr}
bx lr
The uxth is not needed since ldrh implicitly zero-extend the high bits. With
this change it's eliminated.
rdar://12771555
llvm-svn: 169459
|
| |
|
|
| |
llvm-svn: 169325
|
| |
|
|
|
|
|
|
|
| |
The count attribute is more accurate with regards to the size of an array. It
also obviates the upper bound attribute in the subrange. We can also better
handle an unbound array by setting the count to -1 instead of the lower bound to
1 and upper bound to 0.
llvm-svn: 169312
|
| |
|
|
|
|
|
|
|
| |
The count field is necessary because there isn't a difference between the 'lo'
and 'hi' attributes for a one-element array and a zero-element array. When the
count is '0', we know that this is a zero-element array. When it's >=1, then
it's a normal constant sized array. When it's -1, then the array is unbounded.
llvm-svn: 169218
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the alignment is clamped to TargetFrameLowering.getStackAlignment if the target
does not support stack realignment or the option "realign-stack" is off.
This will cause miscompile if the address is treated as aligned and add is
replaced with or in DAGCombine.
Added a bool StackRealignable to TargetFrameLowering to check whether stack
realignment is implemented for the target. Also added a bool RealignOption
to MachineFrameInfo to check whether the option "realign-stack" is on.
rdar://12713765
llvm-svn: 169197
|
| |
|
|
|
|
|
|
|
|
|
| |
The TwoAddressInstructionPass takes the machine code out of SSA form by
expanding REG_SEQUENCE instructions into copies. It is no longer
necessary to rewrite the registers used by a REG_SEQUENCE instruction
because the new coalescer algorithm can do it now.
REG_SEQUENCE is just converted to a sequence of sub-register copies now.
llvm-svn: 169067
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Codegen was failing with an assertion because of unexpected vector
operands when legalizing the selection DAG for a MUL instruction.
The asserting code was legalizing multiplies for vectors of size 128
bits. It uses a custom lowering to try and detect cases where it can
use a VMULL instruction instead of a VMOVL + VMUL. The code was
looking for input operands to the MUL that had been sign or zero
extended. If it found the extended operands it would drop the
sign/zero extension and use the original vector size as input to a
VMULL instruction.
The code assumed that the original input vector was 64 bits so that
after dropping the extension it would fit directly into a D register
and could be used as an operand of a VMULL instruction. The input
code that trigger the failure used a vector of <4 x i8> that was
sign extended to <4 x i32>. It was not safe to drop the sign
extension in this case because the original vector is only 32 bits
wide. The fix is to insert a sign extension for the vector to reach
the required 64 bit size. In this particular example, the vector would
need to be sign extented to a <4 x i16>.
llvm-svn: 169024
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
the last invoke instruction in the function. This also removes the last landing
pad in an function. This is fine, but with SjLj EH code, we've already placed a
bunch of code in the 'entry' block, which expects the landing pad to stick
around.
When we get to the situation where CGP has removed the last landing pad, go
ahead and nuke the SjLj instructions from the 'entry' block.
<rdar://problem/12721258>
llvm-svn: 168930
|
| |
|
|
| |
llvm-svn: 168886
|
| |
|
|
|
|
|
|
|
| |
This could cause miscompilations in targets where sub-register
composition is not always idempotent (ARM).
<rdar://problem/12758887>
llvm-svn: 168837
|
| |
|
|
|
|
| |
Fixes 14337.
llvm-svn: 168809
|
| |
|
|
| |
llvm-svn: 168723
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
boundaries.
Given the following case:
BB0
%vreg1<def> = SUBrr %vreg0, %vreg7
%vreg2<def> = COPY %vreg7
BB1
%vreg10<def> = SUBrr %vreg0, %vreg2
We should be able to CSE between SUBrr in BB0 and SUBrr in BB1.
rdar://12462006
llvm-svn: 168717
|
| |
|
|
|
|
|
|
|
|
|
| |
argument. Instead, use a pair of .local and .comm directives.
This avoids spurious differences between binaries built by the
integrated assembler vs. those built by the external assembler,
since the external assembler may impose alignment requirements
on .lcomm symbols where the integrated assembler does not.
llvm-svn: 168704
|
| |
|
|
| |
llvm-svn: 168658
|
| |
|
|
|
|
|
| |
+ Take account of clobbers
+ Give outputs priority over inputs since they happen later.
llvm-svn: 168360
|
| |
|
|
|
|
|
| |
It turned out that ARM wants different layout of type infos.
This is yet another patch in attempt to fix PR7187
llvm-svn: 168325
|
| |
|
|
|
|
| |
Couperus.
llvm-svn: 168240
|
| |
|
|
|
|
|
| |
test cases require fixes to fast-isel before the verifier can be enabled.
Part of rdar://12594152
llvm-svn: 168233
|
| |
|
|
|
|
|
|
|
| |
This patch replaces the hard coded GPR pair [R0, R1] of
Intrinsic:arm_ldrexd and [R2, R3] of Intrinsic:arm_strexd with
even/odd GPRPair reg class.
Similar to the lowering of atomic_64 operation.
llvm-svn: 168207
|
| |
|
|
|
|
| |
This fixes PR14359
llvm-svn: 168200
|
| |
|
|
|
|
|
|
| |
case to vector legalization so this actually works.
Patch by Pete Couperus. Fixes PR12540.
llvm-svn: 168107
|
| |
|
|
|
|
|
|
|
|
|
| |
is a small negative number.
This patch changes the definition of negative from -0..-255 to -1..-255. I am changing this because of
a bug that we had in some of the patterns that assumed that "subs" of zero does not set the carry flag.
rdar://12028498
llvm-svn: 167963
|
| |
|
|
|
|
|
| |
eh table and handler data if there are no landing pads in the function.
Patch by Logan Chien with some cleanups from me.
llvm-svn: 167945
|
| |
|
|
|
|
|
|
| |
Do some cleanup of the code while here.
Inspired by patch by Logan Chien!
llvm-svn: 167904
|
| |
|
|
|
|
| |
Block priorities still apply outside loops.
llvm-svn: 167793
|
| |
|
|
|
|
|
|
| |
This adds support for weak DAG edges to the general scheduling
infrastructure in preparation for MachineScheduler support for
heuristics based on weak edges.
llvm-svn: 167738
|
| |
|
|
|
|
|
|
|
|
|
|
| |
mov lr, pc
b.w _foo
The "mov" instruction doesn't set bit zero to one, it's putting incorrect
value in lr. It messes up backtraces.
rdar://12663632
llvm-svn: 167657
|
| |
|
|
|
|
|
| |
Improve ARM build attribute emission for architectures types.
This also changes the default architecture emitted for a generic CPU to "v7".
llvm-svn: 167574
|
| |
|
|
| |
llvm-svn: 167318
|
| |
|
|
| |
llvm-svn: 167020
|