| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
| |
Reviewers: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13092
llvm-svn: 248486
|
| |
|
|
|
|
|
|
| |
Fixes the overflow case of llvm.*absdiff intrinsic also updats the tests and LangRef.rst accordingly.
Differential Revision: http://reviews.llvm.org/D11678
llvm-svn: 248483
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The byte-swap recognizer can now notice that this
```
uint32_t bswap(uint32_t x)
{
x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16;
x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8;
return x;
}
```
is a bswap. Fixes PR23863.
Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin
Subscribers: majnemer, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D12637
llvm-svn: 248482
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow a target to do something other than search for copies
that will avoid cross register bank copies.
Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.
I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid kinds
of subregister copies on some targets.
I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.
The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.
llvm-svn: 248478
|
| |
|
|
| |
llvm-svn: 248476
|
| |
|
|
| |
llvm-svn: 248475
|
| |
|
|
| |
llvm-svn: 248474
|
| |
|
|
| |
llvm-svn: 248472
|
| |
|
|
| |
llvm-svn: 248471
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the stores are storing values from loads which partially
alias the stores, we could end up placing the merged loads
and stores on the same chain which has the potential to break.
Each store may have a different chain dependency on only some
of the original loads. Create a new TokenFactor to capture all
of the required dependencies of the stores rather than assuming
all stores can use the same chain.
The testcase is a situation where this happens, although
it does not have an observable change from this. The DAG nodes
just happened to not be reordered before despite this missing
chain dependency.
This is based on an off-list report for an out of tree target
which regressed due to r246307 and I haven't managed to find a case
where the nodes do end up reordered with an in tree target.
llvm-svn: 248468
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.
This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.
llvm-svn: 248467
|
| |
|
|
| |
llvm-svn: 248465
|
| |
|
|
| |
llvm-svn: 248464
|
| |
|
|
|
|
|
| |
This makes the header more readable and cleans up some unnecessary
header differences between NDEBUG and !NDEBUG.
llvm-svn: 248462
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Loop unswitching produces conditional branches with constant condition,
and it's beneficial for later passes to clean this up with simplify-cfg.
We do this after the second invocation of loop-unswitch, but not after
the first one. Not doing so might cause problem for passes like
LoopUnroll, whose estimate of loop body size would be less accurate.
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D13064
llvm-svn: 248460
|
| |
|
|
|
|
|
| |
A use can be emitted before def in a function with stack restore
points but no static allocas.
llvm-svn: 248455
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With this change, subclasses of `llvm::User` will be able to co-allocate
a variable number of bytes (called a "descriptor") with the `llvm::User`
instance. The co-allocated descriptor can later be accessed using
`llvm::User::getDescriptor`. This will be used in later changes to
implement operand bundles.
This change steals one bit from `NumUserOperands`, but given that it is
still 28 bits wide I don't think this will be a practical issue.
This change does not allow allocating hung off uses with descriptors.
This only for simplicity, not for any fundamental reason; and we can
easily add this functionality later if needed.
Reviewers: reames, chandlerc, dexonsmith, kmod, majnemer, pete, JosephTremoulet
Subscribers: pete, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12455
llvm-svn: 248453
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with unconditional.
Nothing is expected to change, except we do less redundant work in
clean-up.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12951
llvm-svn: 248444
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In -fprofile-instr-generate compilation, to remove the redundant profile
variables for the COMDAT functions, these variables are placed in the same
COMDAT group as its associated function. This way when the COMDAT function
is not picked by the linker, those profile variables will also not be
output in the final binary. This may cause warning when mix link objects
built w and wo -fprofile-instr-generate.
This patch puts the profile variables for COMDAT functions to its own COMDAT
group to avoid the problem.
Patch by xur.
Differential Revision: http://reviews.llvm.org/D12248
llvm-svn: 248440
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This time, the issue is that we weren't accounting for the possibility that
aligned DPRs could have been stored after the final "push" in a prologue. When
that happened we effectively moved a "sub sp, #N" from below the aligned stores
to above them, and everything went to pot.
To make it worse, I'd actually committed something testing that we produced
wrong code, so the test update is tiny.
llvm-svn: 248437
|
| |
|
|
|
|
|
|
|
|
| |
Patch by: simoncook
Unlike BitCasts, AddrSpaceCasts do not always produce an output the same size as its input, which was previously assumed. This fixes cases where two address spaces do not have the same size pointer, as an assertion failure would occur when trying to prove deferenceability. LoopUnswitch is used in the particular test, but LICM also exhibits the same problem.
Differential Revision: http://reviews.llvm.org/D13008
llvm-svn: 248422
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes the order of GEPs generated by Splitting GEPs
pass, specially when one of the GEPs has constant and the base is
loop invariant, then we will generate the GEP with constant first
when beneficial, to expose more cases for LICM.
If originally Splitting GEP generate the following:
do.body.i:
%idxprom.i = sext i32 %shr.i to i64
%2 = bitcast %typeD* %s to i8*
%3 = shl i64 %idxprom.i, 2
%uglygep = getelementptr i8, i8* %2, i64 %3
%uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
...
Now it genereates:
do.body.i:
%idxprom.i = sext i32 %shr.i to i64
%2 = bitcast %typeD* %s to i8*
%3 = shl i64 %idxprom.i, 2
%uglygep = getelementptr i8, i8* %2, i64 1032
%uglygep7 = getelementptr i8, i8* %uglygep, i64 %3
...
For no-loop cases, the original way of generating GEPs seems to
expose more CSE cases, so we don't change the logic for no-loop
cases, and only limit our change to the specific case we are
interested in.
llvm-svn: 248420
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
arguments.
Make sure InstCombiner::FoldPHIArgLoadIntoPHI doesn't drop the following
metadata:
MD_tbaa
MD_alias_scope
MD_noalias
MD_invariant_load
MD_nonnull
MD_range
rdar://problem/17617709
Differential Revision: http://reviews.llvm.org/D12710
llvm-svn: 248419
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on FP scalars
Turn this:
movd %xmm0, %eax
movd %xmm1, %ecx
xorl %eax, %ecx
movd %ecx, %xmm0
into this:
xorps %xmm1, %xmm0
This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428
This is an extension of:
http://reviews.llvm.org/rL248395
llvm-svn: 248415
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FP scalars
Turn this:
movd %xmm0, %eax
movd %xmm1, %ecx
orl %eax, %ecx
movd %ecx, %xmm0
into this:
orps %xmm1, %xmm0
This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428
This is an extension of:
http://reviews.llvm.org/rL248395
llvm-svn: 248409
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add two new ways of accessing the unsafe stack pointer:
* At a fixed offset from the thread TLS base. This is very similar to
StackProtector cookies, but we plan to extend it to other backends
(ARM in particular) soon. Bionic-side implementation here:
https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
not emutls).
This is a re-commit of a change in r248357 that was reverted in
r248358.
llvm-svn: 248405
|
| |
|
|
|
|
|
| |
The BEXTR comments didn't make sense before, we may want to extend the
FP logic transform to work on vectors, and this way is more beautiful.
llvm-svn: 248404
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is the first part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.
When range metadata is provided, it should be used to constant fold comparisons with constant values.
Reviewers: sanjoy, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12988
llvm-svn: 248402
|
| |
|
|
|
|
|
|
|
| |
This is a follow-on to:
http://reviews.llvm.org/rL248395
so we can add the call to the or/xor combines too.
llvm-svn: 248399
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on FP scalars
Turn this:
movd %xmm0, %eax
movd %xmm1, %ecx
andl %eax, %ecx
movd %ecx, %xmm0
into this:
andps %xmm1, %xmm0
This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428
Differential Revision: http://reviews.llvm.org/D13065
llvm-svn: 248395
|
| |
|
|
|
|
|
|
|
|
| |
This reverts r248388 and fixes the underlying bug: hasAddr64 was initialized
in runOnMachineFunction, but runOnMachineFunction isn't ever called in
CodeGen/WebAssembly/global.ll since that testcase has no functions. The fix
here is to use AsmPrinter's getPointerSize() as needed to determine the
pointer size instead.
llvm-svn: 248394
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the behavior of AddAligntmentAssumptions to match its
comment. I.e, prove the asserted alignment in the context of the caller,
not the callee.
Thanks to Mehdi Amini for seeing the issue here! Also to Artur Pilipenko
who also saw a fix for the issue.
rdar://22521387
Differential Revision: http://reviews.llvm.org/D12997
llvm-svn: 248390
|
| |
|
|
| |
llvm-svn: 248388
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Invoking a function which returns an aggregate can sometimes be
transformed to return a scalar value. However, this means that we need
to create an insertvalue instruction(s) to recreate the correct
aggregate type. We achieved this by inserting an insertvalue
instruction at the invoke's normal successor. However, this is not
feasible if the normal successor uses the invoke's return value inside a
PHI node.
Instead, split the edge between the invoke and the unwind successor and
create the insertvalue instruction in the new basic block. The new
basic block's successor will be the old invoke successor which leaves
us with IR which is well behaved.
This fixes PR24906.
llvm-svn: 248387
|
| |
|
|
|
|
| |
function. NFC.
llvm-svn: 248377
|
| |
|
|
|
|
|
|
| |
This change allows dead store elimination to remove zero and null stores into memory freshly allocated with calloc-like function.
Differential Revision: http://reviews.llvm.org/D13021
llvm-svn: 248374
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The ARM backend has some logic that only allows the fast-isel to be enabled for
subtargets where it is known to be stable. This adds a backend option to
override this and force the fast-isel to be used for any target, to allow it to
be tested.
This is an ARM-specific option, because no other backend disables the fast-isel
on a per-subtarget basis.
llvm-svn: 248369
|
| |
|
|
|
|
|
|
|
|
| |
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.
LLVM counterpart to D12835
Differential Revision: http://reviews.llvm.org/D13002
llvm-svn: 248368
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.
I've refactored the call sites I could find easily to use getZero /
getOne.
Reviewers: hfinkel, majnemer, reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12947
llvm-svn: 248362
|
| |
|
|
|
|
|
| |
test/Transforms/SafeStack/abi.ll breaks when target is not supported;
needs refactoring.
llvm-svn: 248358
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add two new ways of accessing the unsafe stack pointer:
* At a fixed offset from the thread TLS base. This is very similar to
StackProtector cookies, but we plan to extend it to other backends
(ARM in particular) soon. Bionic-side implementation here:
https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
not emutls).
llvm-svn: 248357
|
| |
|
|
|
|
| |
Fixed the issue that when there is an edge from the jump table to the default statement, we should check it directly instead of checking if the sibling of the jump table header is a successor of the jump table header, which may not be the default statment but a successor of it.
llvm-svn: 248354
|
| |
|
|
| |
llvm-svn: 248340
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We may have subregister defs which are unused but not discovered and
cleaned up prior to liveness analysis. This creates multiple connected
components in the resulting live range which are forbidden in the
MachineVerifier because they would unnecesarily constrain the register
allocator. Rewrite those dead definitions to define a newly created
virtual register.
Differential Revision: http://reviews.llvm.org/D13035
llvm-svn: 248335
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ConnectedVNInfoEqClasses::Distribute()
This improves ConnectedVNInfoEqClasses::Distribute() to distribute the
segments and value numbers in the subranges instead of conservatively
clearing all subregister info.
No separate test here, just clearing the subrange instead of properly
distributing them would however break my upcoming fix regarding dead super
register definitions.
Differential Revision: http://reviews.llvm.org/D13075
llvm-svn: 248334
|
| |
|
|
| |
llvm-svn: 248333
|
| |
|
|
|
|
|
|
| |
Apart from checking that GlobalVariable is a constant, we should check
that it's not a weak constant, in which case we can't propagate its
value.
llvm-svn: 248327
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARM counterpart to r248291:
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
llvm-svn: 248294
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
llvm-svn: 248291
|
| |
|
|
| |
llvm-svn: 248278
|