| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
llvm-svn: 298865
|
|
|
|
|
|
|
|
| |
A majority of loads and stores at O0 access an alloca.
It's trivial to fold the G_FRAME_INDEX into the instruction; do it.
llvm-svn: 298864
|
|
|
|
|
|
|
|
|
|
| |
We're not to the point of supporting the load/store patterns yet
(because they extensively use PatFrags).
But in the meantime, we can implement some of the simplest addressing
modes.
llvm-svn: 298863
|
|
|
|
|
|
| |
These occur very frequently, and are quite trivial to catch.
llvm-svn: 298862
|
|
|
|
|
|
|
|
| |
Patch by Axel Davy (axel.davy@normalesup.org)
Differential revision: https://reviews.llvm.org/D30150
llvm-svn: 298861
|
|
|
|
|
|
| |
This is more consistent with what we do for other operations. This shrinks the opt binary on my build by ~72k.
llvm-svn: 298858
|
|
|
|
|
|
|
|
| |
Patch by Axel Davy (axel.davy@normalesup.org)
Differential revision: https://reviews.llvm.org/D30145
llvm-svn: 298857
|
|
|
|
|
|
|
|
|
|
| |
CBZ/CBNZ represent a substantial portion of all conditional branches.
Look through G_ICMP to select them.
We can't use tablegen yet because the existing patterns match an
AArch64ISD node.
llvm-svn: 298856
|
|
|
|
|
|
| |
Use it to compare immediate operands.
llvm-svn: 298855
|
|
|
|
|
|
|
|
|
|
| |
Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type
Reviewers: vpykhtin, arsenm
Differential Revision: https://reviews.llvm.org/D31327
llvm-svn: 298852
|
|
|
|
|
|
|
|
|
| |
Among other things, this allows Machine LICM to hoist a costly 'mrs'
instruction from within a loop.
Differential Revision: http://reviews.llvm.org/D31151
llvm-svn: 298851
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.
The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.
Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.
Differential Revision: https://reviews.llvm.org/D31284
llvm-svn: 298846
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
element is a vector type
Summary:
We are incorrectly folding selects into phi nodes when the incoming value of a phi
node is a constant vector. This optimization is done in `FoldOpIntoPhi` when the
select condition is a phi node with constant incoming values.
Without the fix, we are miscompiling (i.e. incorrectly folding the
select into the phi node) when the vector contains non-zero
elements.
This patch fixes the miscompile and we will correctly fold based on the
select vector operand (see added test cases).
Reviewers: majnemer, sanjoy, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31189
llvm-svn: 298845
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It should return <0, 0, or >0 for less-than, equal, and greater-than like
strcmp() (according to the history, it used to be implemented with
strcmp()) but it actually returned 0, or 1 for not-equal and equal.
Reviewers: qcolombet
Reviewed By: qcolombet
Subscribers: qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D30996
llvm-svn: 298844
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in AVX2
This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1.
The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2.
In this case using vector unpack instructions is more efficient.
Reviewers:
zvi
delena
igorb
craig.topper
guyblank
eladcohen
m_zuckerman
aymanmus
RKSimon
llvm-svn: 298840
|
|
|
|
|
|
|
|
| |
instead of the non-const version. NFCI
This removes a const_cast of the this pointer.
llvm-svn: 298831
|
|
|
|
|
|
| |
const_cast. NFCI
llvm-svn: 298830
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BasicBlock using the const version instead of the non-const version
Summary:
During post-commit review of a previous change I made it was pointed out that const casting 'this' is technically a bad practice. This patch re-implements all of the methods in BasicBlock that do this to use the const BasicBlock version and const_cast the return value instead.
I think there are still many other classes that do similar things. I may look at more in the future.
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31377
llvm-svn: 298827
|
|
|
|
|
|
| |
static version.
llvm-svn: 298826
|
|
|
|
| |
llvm-svn: 298825
|
|
|
|
| |
llvm-svn: 298823
|
|
|
|
|
|
|
|
|
| |
References in cloned blocks must be remapped prior to dominator
calculation.
Differential Revision: https://reviews.llvm.org/D31281
llvm-svn: 298811
|
|
|
|
|
|
| |
instructions
llvm-svn: 298806
|
|
|
|
|
|
|
|
|
|
| |
Fixed -verify-machineinstrs errors in fast-isel-select-sse.ll (one of many in PR27481)
The VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk instructions were assuming both source registers were V128X when the second is actually supposed to be FR32X/FR64X
Differential Revision: https://reviews.llvm.org/D31200
llvm-svn: 298805
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Support G_FRAME_INDEX instruction selection.
Reviewers: zvi, rovka, ab, qcolombet
Reviewed By: ab
Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank
Differential Revision: https://reviews.llvm.org/D30980
llvm-svn: 298800
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.
The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.
Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.
Differential Revision: https://reviews.llvm.org/D30333
llvm-svn: 298799
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves it to the iterator facade utilities giving it full random
access semantics, etc. It can also now be used with standard algorithms
like std::all_of and std::any_of and range adaptors like llvm::reverse.
Also make the semantics of iterating match what every other iterator
uses and forbid decrementing past the begin iterator. This was used as
a hacky way to work around iterator invalidation. However, every
instance trying to do this failed to actually avoid touching invalid
iterators despite the clear documentation that the removed and all
subsequent iterators become invalid including the end iterator. So I've
added a return of the next iterator to removeCase and rewritten the
loops that were doing this to correctly follow the iterator pattern of
either incremneting or removing and assigning fresh values to the
iterator and the end.
In one case we were trying to go backwards to make this cleaner but it
doesn't actually work. I've made that code match the code we use
everywhere else to remove cases as we iterate. This changes the order of
cases in one test output and I moved that test to CHECK-DAG so it
wouldn't care -- the order isn't semantically meaningful anyways.
llvm-svn: 298791
|
|
|
|
| |
llvm-svn: 298783
|
|
|
|
|
|
|
|
|
|
| |
(NumSignBits-1))
Part 3 of 3.
Differential Revision: https://reviews.llvm.org/D31347
llvm-svn: 298782
|
|
|
|
|
|
|
|
| |
Part 2 of 3.
Differential Revision: https://reviews.llvm.org/D31347
llvm-svn: 298780
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch to generalize combinePCMPAnd1 (for handling SETCC + ZEXT cases) to work for any input that has zero/all bits set masked with an 'all low bits' mask.
Replaced the implicit assumption of shift availability with a call to SupportedVectorShiftWithImm.
Part 1 of 3.
Differential Revision: https://reviews.llvm.org/D31347
llvm-svn: 298779
|
|
|
|
|
|
|
|
|
| |
This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality,
we can replace memcmp calls with inline code that is both smaller and faster.
Differential Revision: https://reviews.llvm.org/D31290
llvm-svn: 298775
|
|
|
|
|
|
|
|
| |
the instruction and operand instead of the Use.
The first thing it did was get the User for the Use to get the instruction back. This requires looking through the Uses for the User using the waymarking walk. That's pretty fast, but its probably still better to just pass the Instruction we already had.
llvm-svn: 298772
|
|
|
|
| |
llvm-svn: 298768
|
|
|
|
|
|
|
| |
This avoids 'used but not defined' warnings in Release builds
with GCC.
llvm-svn: 298760
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.
amdgiz is for non-OpenCL environment where generic address space is 0.
amdgizcl is for OpenCL environment where generic address space is 0.
Differential Revision: https://reviews.llvm.org/D31211
llvm-svn: 298758
|
|
|
|
| |
llvm-svn: 298757
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When possible, put ASan ctor/dtor in comdat.
The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.
The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.
llvm-svn: 298756
|
|
|
|
|
|
| |
minimization if another bug was found during minimization (https://github.com/google/oss-fuzz/issues/452)
llvm-svn: 298755
|
|
|
|
| |
llvm-svn: 298752
|
|
|
|
| |
llvm-svn: 298751
|
|
|
|
|
|
|
|
|
|
| |
If we have an array of a user-defined aggregates for which there was an
ODR violation, then the array size will not necessarily match the number
of elements times the size of the element.
Fixes PR32383
llvm-svn: 298750
|
|
|
|
|
|
|
| |
When I tested r298734, I thought that red zones were enabled by default like in
X86. Since red zones are behind a flag on AArch64 the testing wasn't true.
llvm-svn: 298747
|
|
|
|
| |
llvm-svn: 298746
|
|
|
|
|
|
|
|
| |
Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits"
Tsan bot is failing.
llvm-svn: 298745
|
|
|
|
|
|
| |
crash minimization (https://github.com/google/oss-fuzz/issues/250)
llvm-svn: 298740
|
|
|
|
|
|
|
|
| |
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.
llvm-svn: 298737
|
|
|
|
| |
llvm-svn: 298736
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes)
LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413
Original change description:
[LV] Vectorize GEPs
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.
With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.
The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.
Original Differential Revision: https://reviews.llvm.org/D30710
llvm-svn: 298735
|
|
|
|
|
|
|
|
|
|
|
| |
AArch64 doesn't require -mno-red-zone; stack fixups are sufficient here. This was
unnecessarily copied over from the X86 target.
(You can now outline with red zones! Yay!)
Removing the requirement passes all Single/MultiSource tests.
llvm-svn: 298734
|