| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
llvm-svn: 293514
|
|
|
|
|
|
|
|
| |
fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred)
InstCombine already does this.
llvm-svn: 293512
|
|
|
|
|
|
|
|
| |
This reverts commit r293503.
Revert while I investigate some of the buildbot failures.
llvm-svn: 293509
|
|
|
|
|
|
| |
with splat constants
llvm-svn: 293507
|
|
|
|
| |
llvm-svn: 293504
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.
Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm
Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris
Differential Revision: https://reviews.llvm.org/D26730
llvm-svn: 293503
|
|
|
|
| |
llvm-svn: 293502
|
|
|
|
|
|
|
|
|
| |
This reverts commit r293196
Besides making things look nicer, ATM, we'd like to preserve analysis
more than we'd like to destroy the CFG. We'll probably revisit in the future
llvm-svn: 293501
|
|
|
|
|
|
| |
target shuffles
llvm-svn: 293500
|
|
|
|
|
|
|
| |
This fixes emitting conversions of constants on targets
without legal f16 that need to use these for legalization.
llvm-svn: 293499
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original shift is bigger, so this may qualify as 'obvious',
but here's an attempt at an Alive-based proof:
Name: exact
Pre: (C1 u< C2)
%a = shl i8 %x, C1
%b = lshr exact i8 %a, C2
=>
%c = lshr exact i8 %x, C2 - C1
%b = and i8 %c, ((1 << width(C1)) - 1) u>> C2
Optimization is correct!
llvm-svn: 293498
|
|
|
|
| |
llvm-svn: 293496
|
|
|
|
|
|
| |
This significantly reduces the noise level of these messages.
llvm-svn: 293492
|
|
|
|
| |
llvm-svn: 293489
|
|
|
|
| |
llvm-svn: 293487
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
AMDGPU has two register classes with the same set of registers, and this
was causing this tablegen backend would get stuck in infinite recursion.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: tpr, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D29049
llvm-svn: 293483
|
|
|
|
| |
llvm-svn: 293478
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D28992
llvm-svn: 293476
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28079
llvm-svn: 293470
|
|
|
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28354
llvm-svn: 293469
|
|
|
|
|
|
|
|
| |
broadcast. We can use COPY_TO_REGCLASS like AVX does.
This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX.
llvm-svn: 293464
|
|
|
|
|
|
| |
Largely based on LLD test for dtrace.
llvm-svn: 293451
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: joe.abbey, craig.topper
Reviewed By: craig.topper
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D29201
llvm-svn: 293447
|
|
|
|
|
|
|
|
| |
lower half.
Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width.
llvm-svn: 293446
|
|
|
|
|
|
|
|
|
|
| |
Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector.
The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem.
Differential Revision: https://reviews.llvm.org/D29097
llvm-svn: 293438
|
|
|
|
| |
llvm-svn: 293437
|
|
|
|
|
|
| |
with splats
llvm-svn: 293435
|
|
|
|
| |
llvm-svn: 293434
|
|
|
|
|
|
|
|
|
|
| |
Support lowering AEABI TLS access (__aeabi_read_tp) with long calls.
This requires adjusting the call sequence to use an indirect call to get
full addressability.
Resolves PR31769!
llvm-svn: 293433
|
|
|
|
|
|
|
|
|
|
|
| |
PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern.
AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned.
See https://llvm.org/bugs/show_bug.cgi?id=31773
Differential Revision: https://reviews.llvm.org/D29196
llvm-svn: 293431
|
|
|
|
|
|
|
|
| |
llvm/test/CodeGen/AArch64/GlobalISel/gisel-abort.ll.
Unsupported target might be induced if default target is neither macho nor elf. (e.g. *-win32)
llvm-svn: 293430
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add limited (i8/i16/i32/i64) argument lowering support to the IRTranslator.
Inspired by commit 289940.
Reviewers: t.p.northover, qcolombet, ab, zvi, rovka
Reviewed By: rovka
Subscribers: dberris, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D28987
llvm-svn: 293427
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the following instructions:
* mfpmr
* mtpmr
* icblc
* icblq
* icbtls
Fix the scheduling for mtspr on e5500, which uses CFX0, instead of
SFX0/SFX1 as on e500mc.
Addresses PR 31538.
Differential Revision: https://reviews.llvm.org/D29002
llvm-svn: 293417
|
|
|
|
|
|
| |
are XORs.
llvm-svn: 293403
|
|
|
|
|
|
|
|
| |
to the AND are XORs.
The matching code tries to canonicalize XOR to the left, but if there are two XORs and only one is a vnot, this canonicalization can prevent matching.
llvm-svn: 293402
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
non-consecutive (jumbled) way.
The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask.
Reviewers: hfinkel, mssimpso, mkuper
Differential Revision: https://reviews.llvm.org/D26905
Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad
llvm-svn: 293386
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support for barrier synchronization between a subset of threads
in a CTA through one of sixteen explicitly specified barriers.
These intrinsics are not directly exposed in CUDA but are
critical for forthcoming support of OpenMP on NVPTX GPUs.
The intrinsics allow the synchronization of an arbitrary
(multiple of 32) number of threads in a CTA at one of 16
distinct barriers. The two intrinsics added are as follows:
call void @llvm.nvvm.barrier.n(i32 10)
waits for all threads in a CTA to arrive at named barrier #10.
call void @llvm.nvvm.barrier(i32 15, i32 992)
waits for 992 threads in a CTA to arrive at barrier #15.
Detailed description of these intrinsics are available in the PTX manual.
http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions
Reviewers: hfinkel, jlebar
Differential Revision: https://reviews.llvm.org/D17657
llvm-svn: 293384
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch by Michele Scandale
(with a small tweak to 'CHECK-NOT' the last DILocation in the test)
Subscribers: bogner, llvm-commits
Differential Revision: https://reviews.llvm.org/D27980
llvm-svn: 293377
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice.
Reviewers: aprantl, andreadb, danielcdh
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29062
llvm-svn: 293372
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the OperandsMapper creates virtual registers, it used to just create
plain scalar register with the right size. This may confuse the
instruction selector because we lose the information of the instruction
using those registers what supposed to do. The MachineVerifier complains
about that already.
With this patch, the OperandsMapper still creates plain scalar register,
but the expectation is for the mapping function to remap the type
properly. The default mapping function has been updated to do that.
rdar://problem/30231850
llvm-svn: 293362
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In r292621, the recommit fixes a bug related with live interval update
after the partial redundent copy is moved.
This recommit solves an additional bug related to the lack of update of
subranges.
The original patch is to solve the performance problem described in
PR27827. Register coalescing sometimes cannot remove a copy because of
interference. But if we can find a reverse copy in one of the predecessor
block of the copy, the copy is partially redundent and we may remove the
copy partially by moving it to the predecessor block without the
reverse copy.
Differential Revision: https://reviews.llvm.org/D28585
Re-apply r292621
Revert "Revert rL292621. Caused some internal build bot failures in apple."
This reverts commit r292984.
Original patch: Wei Mi <wmi@google.com>
Subrange fix: Mostly Matthias Braun <matze@braunis.de>
llvm-svn: 293353
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
inf-loop (PR31751)
This is a minimal patch to avoid the infinite loop in:
https://llvm.org/bugs/show_bug.cgi?id=31751
But the general problem is bigger: we're not canonicalizing all of the min/max forms reported
by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code
uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own
inline definitions which may be subtly different from any of the above.
The reason that the test cases in this patch need a cast op to trigger is because we don't
(yet) canonicalize all min/max forms based on matchSelectPattern() in
canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on
matchSelectPattern() in visitSelectInst().
The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so
I'm moving those behind the min/max fence in visitICmpInst() as the quick fix.
llvm-svn: 293345
|
|
|
|
|
|
|
| |
Since it's not actually a generic MI, its register operands need a RegClass,
which is conveniently the target's pointer RegClass.
llvm-svn: 293335
|
|
|
|
|
|
| |
Should fix machine verifier failures.
llvm-svn: 293334
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Small change to get the FREEP instruction to decode properly.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29193
llvm-svn: 293314
|
|
|
|
|
|
|
|
|
|
|
| |
Accomplishes what r292982 was supposed to, which ended up
only really making the necessary test changes.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 293310
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The interleaved access pass is an IR-to-IR transformation that runs before code
generation. It matches interleaved memory operations to target-specific
intrinsics (that are later lowered to load and store multiple instructions on
ARM/AArch64). We place tests for similar passes (e.g., GlobalMergePass) under
test/Transforms. This patch moves the InterleavedAccessPass tests out of
test/CodeGen and into target-specific directories under
test/Transforms/InterleavedAccess.
Although the pass is an IR pass, many of the existing tests were llc tests
rather opt tests. For example, the tests would check for ldN/stN instructions
generated by llc rather than the intrinsic calls the pass actually inserts.
Thus, this patch updates all tests to be opt tests that check for the inserted
intrinsics. We already have separate CodeGen tests that ensure we lower the
interleaved access intrinsics to their corresponding ldN/stN instructions. In
addition to migrating the tests to opt, this patch also performs some minor
clean-up (to ensure consistent naming, etc.).
Differential Revision: https://reviews.llvm.org/D29184
llvm-svn: 293309
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This change prevent the signed value of cost from being negative as the value is passed as an unsigned argument.
Reviewers: mcrosier, jmolloy, qcolombet, javed.absar
Reviewed By: mcrosier, qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28871
llvm-svn: 293307
|
|
|
|
|
|
|
|
|
| |
With the adjustPassManager interface that is now possible to use
custom early module passes.
Differential Revision: https://reviews.llvm.org/D29189
llvm-svn: 293300
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is fixing pr31761: BasicAA is deducing NoAlias
on the result of the GEP if the base pointer is itself NoAlias.
This is possible only if the NoAlias on the base pointer is
deduced with a non-sized query: this should guarantee that
the pointers are belonging to different memory allocation
and that the GEP can't legally jump from one to another.
Differential Revision: https://reviews.llvm.org/D29216
llvm-svn: 293293
|