| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
rewritten as a compare against a constant 0 with the opposite condition.
llvm-svn: 170495
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A register can be associated with several distinct register classes.
For example, on PPC, the floating point registers are each associated with
both F4RC (which holds f32) and F8RC (which holds f64). As a result, this code
would fail when provided with a floating point register and an f64 operand
because it would happen to find the register in the F4RC class first and
return that. From the F4RC class, SDAG would extract f32 as the register
type and then assert because of the invalid implied conversion between
the f64 value and the f32 register.
Instead, search all register classes. If a register class containing the
the requested register has the requested type, then return that register
class. Otherwise, as before, return the first register class found that
contains the requested register.
llvm-svn: 170436
|
|
|
|
|
|
|
|
|
| |
TargetLowering::getRegClassFor).
Some isSimple() guards were missing, or getSimpleVT() were hoisted too
far, resulting in asserts on valid LLVM assembly input.
llvm-svn: 170336
|
|
|
|
|
|
| |
EVT.
llvm-svn: 170183
|
|
|
|
| |
llvm-svn: 170148
|
|
|
|
|
|
|
|
| |
EVT.
Accordingly, change RegDefIter to contain MVTs instead of EVTs.
llvm-svn: 170140
|
|
|
|
|
|
|
|
|
|
|
|
| |
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
This is the second attempt. In the first attempt (r169837), a few
getSimpleVT() were hoisted too far, detected by bootstrap failures.
llvm-svn: 170104
|
|
|
|
|
|
| |
before referencing them. rdar://12868039
llvm-svn: 170078
|
|
|
|
|
|
|
| |
load / store pair. It's not legal to use a wider load than the size of
the remaining bytes if it's the first pair of load / store.
llvm-svn: 170018
|
|
|
|
|
|
|
|
|
|
|
|
| |
mention the inline memcpy / memset expansion code is a mess?
This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.
llvm-svn: 169959
|
|
|
|
|
|
|
|
|
| |
Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.
llvm-svn: 169954
|
|
|
|
|
|
| |
rdar://12838504
llvm-svn: 169951
|
|
|
|
|
|
| |
f64 load / store on non-SSE2 x86 targets.
llvm-svn: 169944
|
|
|
|
|
|
|
|
|
| |
ScalarTargetTransformInfo::getIntImmCost() instead. "Legal" is a poorly defined
term for something like integer immediate materialization. It is always possible
to materialize an integer immediate. Whether to use it for memcpy expansion is
more a "cost" conceern.
llvm-svn: 169929
|
|
|
|
| |
llvm-svn: 169854
|
|
|
|
|
|
| |
instead of EVTs.
llvm-svn: 169851
|
|
|
|
|
|
|
|
| |
MVTs, instead of EVTs.
Accordingly, add bitsLT (and similar) to MVT.
llvm-svn: 169850
|
|
|
|
|
|
| |
from EVT.
llvm-svn: 169849
|
|
|
|
|
|
| |
EVTs.
llvm-svn: 169848
|
|
|
|
|
|
| |
EVTs.
llvm-svn: 169847
|
|
|
|
|
|
| |
of EVT.
llvm-svn: 169845
|
|
|
|
|
|
| |
instead of EVTs.
llvm-svn: 169844
|
|
|
|
| |
llvm-svn: 169843
|
|
|
|
|
|
| |
EVT.
llvm-svn: 169842
|
|
|
|
| |
llvm-svn: 169841
|
|
|
|
| |
llvm-svn: 169840
|
|
|
|
| |
llvm-svn: 169839
|
|
|
|
|
|
|
|
| |
EVT.
Accordingly, change RegDefIter to contain MVTs instead of EVTs.
llvm-svn: 169838
|
|
|
|
|
|
|
|
|
| |
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
llvm-svn: 169837
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
try to reduce the width of this load, and would end up transforming:
(truncate (lshr (sextload i48 <ptr> as i64), 32) to i32)
to
(truncate (zextload i32 <ptr+4> as i64) to i32)
We lost the sext attached to the load while building the narrower i32
load, and replaced it with a zext because lshr always zext's the
results. Instead, bail out of this combine when there is a conflict
between a sextload and a zext narrowing. The rest of the DAG combiner
still optimize the code down to the proper single instruction:
movswl 6(...),%eax
Which is exactly what we wanted. Previously we read past the end *and*
missed the sign extension:
movl 6(...), %eax
llvm-svn: 169802
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles. Testing with the external/internal nightly
test suite reveals no change in compile time performance. Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures. All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.
In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.
Part of rdar://12553082
llvm-svn: 169796
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
if it's not possible to materialize an integer immediate with a single
instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
Also increase the threshold to something reasonable (8 for memset, 4 pairs
for memcpy).
This significantly improves Dhrystone, up to 50% on ARM iOS devices.
rdar://12760078
llvm-svn: 169791
|
|
|
|
| |
llvm-svn: 169776
|
|
|
|
| |
llvm-svn: 169773
|
|
|
|
| |
llvm-svn: 169772
|
|
|
|
| |
llvm-svn: 169727
|
|
|
|
| |
llvm-svn: 169692
|
|
|
|
|
|
| |
or all 0s. These cases can show up when vectors are split for legalizing. Fix some tests that were dependent on these cases not being combined.
llvm-svn: 169684
|
|
|
|
|
|
|
|
|
|
| |
understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).
rdar://12771555
llvm-svn: 169536
|
|
|
|
|
|
|
| |
check if loads that happen in between stores alias with the first store in the
chain, only with the second store onwards.
llvm-svn: 169516
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and extload's. If they are implemented as zero-extend, or implicitly
zero-extend, then this can enable more demanded bits optimizations. e.g.
define void @foo(i16* %ptr, i32 %a) nounwind {
entry:
%tmp1 = icmp ult i32 %a, 100
br i1 %tmp1, label %bb1, label %bb2
bb1:
%tmp2 = load i16* %ptr, align 2
br label %bb2
bb2:
%tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ]
%cmp = icmp ult i16 %tmp3, 24
br i1 %cmp, label %bb3, label %exit
bb3:
call void @bar() nounwind
br label %exit
exit:
ret void
}
This compiles to the followings before:
push {lr}
mov r2, #0
cmp r1, #99
bhi LBB0_2
@ BB#1: @ %bb1
ldrh r2, [r0]
LBB0_2: @ %bb2
uxth r0, r2
cmp r0, #23
bhi LBB0_4
@ BB#3: @ %bb3
bl _bar
LBB0_4: @ %exit
pop {lr}
bx lr
The uxth is not needed since ldrh implicitly zero-extend the high bits. With
this change it's eliminated.
rdar://12771555
llvm-svn: 169459
|
|
|
|
|
|
|
|
|
|
| |
missed in the first pass because the script didn't yet handle include
guards.
Note that the script is now able to handle all of these headers without
manual edits. =]
llvm-svn: 169224
|
|
|
|
| |
llvm-svn: 169198
|
|
|
|
| |
llvm-svn: 169196
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
|
|
|
|
| |
llvm-svn: 169111
|
|
|
|
| |
llvm-svn: 168932
|
|
|
|
|
|
|
| |
If we need to split the operand of a VSELECT, it must be the mask operand. We
split the entire VSELECT operand with EXTRACT_SUBVECTOR.
llvm-svn: 168883
|
|
|
|
|
|
|
|
| |
computing the legalization method for vectors
For some targets, it is desirable to prefer scalarizing <N x i1> instead of promoting to a larger legal type, such as <N x i32>.
llvm-svn: 168882
|
|
|
|
|
|
| |
loads do not alias.
llvm-svn: 168832
|