| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
| |
into smaller BUILD_VECTORs
Only do this before operations are legalized of BUILD_VECTOR is Legal for the target.
Differential Revision: https://reviews.llvm.org/D37186
llvm-svn: 311892
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
possible sets of x86 prefixes. This patch is the first step to close PR7709 and PR17697. There will be next patch(es) to close relative PRs.
Differential Revision: https://reviews.llvm.org/D36788
M lib/Target/X86/Disassembler/X86DisassemblerDecoder.cpp
M lib/Target/X86/Disassembler/X86DisassemblerDecoder.h
A test/MC/Disassembler/X86/prefixes-i386.s
A test/MC/Disassembler/X86/prefixes-x86_64.s
M test/MC/Disassembler/X86/prefixes.txt
llvm-svn: 311882
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target.
We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling.
The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792.
Information includes latency, number of micro-Ops and used ports by each HSW instruction.
Please expect some performance fluctuations due to code alignment effects.
Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud
Differential Revision: https://reviews.llvm.org/D36663
llvm-svn: 311879
|
| |
|
|
| |
llvm-svn: 311875
|
| |
|
|
|
|
|
|
| |
using X86ISD::UNPCKL in reduceVMULWidth.
This runs fairly early, we should use target independent nodes if possible.
llvm-svn: 311873
|
| |
|
|
|
|
|
|
| |
when SSE2 is disabled.
Without this the madd.ll and sad.ll test cases both throw assertions if you run them with SSE2 disabled.
llvm-svn: 311872
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
struct string {
~string();
};
void f2();
void f1(int) { f2(); }
void run(int c) {
string body;
while (true) {
if (c)
f1(c);
else
f1(c);
}
}
Will recommit once the issue is fixed.
llvm-svn: 311864
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch enables generation of NMADD and NMSUB instructions when fneg node
is present. These instructions are currently only generated if fsub node is
present.
Patch by Stanislav Ocovaj.
Differential Revision: https://reviews.llvm.org/D34507
llvm-svn: 311862
|
| |
|
|
|
|
|
|
|
| |
Move condition code support functions to Utils and remove code duplication.
Reviewed by: @fhahn, @asb
Differential Revision: https://reviews.llvm.org/D37179
llvm-svn: 311860
|
| |
|
|
|
|
| |
the lowest subvector. This time with bitcasts between the vselect and the extract.
llvm-svn: 311856
|
| |
|
|
|
|
|
|
| |
As noted in the FIXME, this could be improved more, but this is the smallest fix
that helps:
https://bugs.llvm.org/show_bug.cgi?id=34111
llvm-svn: 311853
|
| |
|
|
|
|
|
|
|
| |
Simplify getDRegFromQReg function
Reviewed by: @fhahn, @asb
Differential Revision: https://reviews.llvm.org/D37118
llvm-svn: 311850
|
| |
|
|
|
|
|
|
|
|
|
| |
Original commit r311077 of D32871 was reverted in r311304 due to failures
reported in PR34248.
This recommit fixes PR34248 by restricting the packing of predicated scalars
into vectors only when vectorizing, avoiding doing so when unrolling w/o
vectorizing. Added a test derived from the reproducer of PR34248.
llvm-svn: 311849
|
| |
|
|
|
|
| |
all zero/one build_vectors.
llvm-svn: 311841
|
| |
|
|
| |
llvm-svn: 311840
|
| |
|
|
| |
llvm-svn: 311838
|
| |
|
|
|
|
|
|
|
|
| |
between the vselect and the extract_subvector. Remove the late DAG combine.
We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't.
Add some more test cases to ensure we've also got most of the zero masking covered too.
llvm-svn: 311837
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Remove redundant explicit template instantiation.
This was reported by Andrew Kelley building release_50 with gcc7.2.0 on MacOS: duplicate symbol llvm::DominatorTreeBase.
Reviewers: kuhar, andrewrk, davide, hans
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37185
llvm-svn: 311835
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If all the operands of a BUILD_VECTOR extract elements from same vector then split the
vector efficiently based on the maximum vector access index.
This will also fix PR 33784
Reviewers: zvi, delena, RKSimon, thakis
Reviewed By: RKSimon
Subscribers: chandlerc, eladcohen, llvm-commits
Differential Revision: https://reviews.llvm.org/D35788
llvm-svn: 311833
|
| |
|
|
|
|
|
|
| |
Summary: This reverts commit rL311247.
Differential Revision: https://reviews.llvm.org/D36927
llvm-svn: 311832
|
| |
|
|
|
|
| |
for loads, not just when we do it for stores.
llvm-svn: 311829
|
| |
|
|
|
|
| |
We were suppressing most uses of INC/DEC, but this one seems to have been missed.
llvm-svn: 311828
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add options -print-bfi/-print-bpi that dump block frequency and branch
probability info like -view-block-freq-propagation-dags and
-view-machine-block-freq-propagation-dags do but in text.
This is useful when the graph is very large and complex (the dot command
crashes, lines/edges too close to tell apart, hard to navigate without textual
search) or simply when text is preferred.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37165
llvm-svn: 311822
|
| |
|
|
|
|
|
|
| |
extract_subvector of the lowest subvector.
This only supports 32 and 64 bit element sizes for now. But we could probably do 16 and 8-bit elements with BWI.
llvm-svn: 311821
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to instructions.
These can't be reasonably matched in tablegen due to the handling of
flags, so we have to do this in C++ code. We only did it for `inc` and
`dec` historically, this starts fleshing that out to more interesting
instructions. Notably, this handles transfering operands to `add` and
`sub`.
Currently this forces them into a register. The next patch will add
support for keeping immediate operands as immediates. Then I'll extend
this beyond just `add` and `sub`.
I'm not super thrilled by the repeated switches in the code but
everything else I tried was really ugly or problematic.
Many thanks to Craig Topper for the suggestions about where to even
begin here and how to make this stuff work.
Differential Revision: https://reviews.llvm.org/D37130
llvm-svn: 311806
|
| |
|
|
|
|
| |
Fixes PR34325.
llvm-svn: 311805
|
| |
|
|
|
|
|
|
| |
Prior to this change (and after r311371), we computed it
unconditionally, causin gsevere compile time regressions (in some
cases, 5 to 10x).
llvm-svn: 311804
|
| |
|
|
|
|
| |
This reverts r311801 due to a bot failure.
llvm-svn: 311803
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Don't sanitize __sancov_lowest_stack.
- Don't instrument leaf functions.
- Add CoverageStackDepth to Fuzzer and FuzzerNoLink.
Reviewers: vitalybuka, kcc
Reviewed By: kcc
Subscribers: cfe-commits, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D37156
llvm-svn: 311801
|
| |
|
|
| |
llvm-svn: 311794
|
| |
|
|
|
|
|
|
|
| |
Change the early exit condition from Cost > Threshold to Cost >= Threshold
because the inline condition is Cost < Threshold.
Differential Revision: https://reviews.llvm.org/D37087
llvm-svn: 311791
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bit of Add/Sub is demanded.
Just create an all 1s demanded mask and continue recursing like normal. The recursive calls should be able to handle an all 1s mask and do the right thing.
The only time we should care about knowing whether the upper bit was demanded is when we need to know if we should clear the NSW/NUW flags.
Now that we have a consistent path through the code for all cases, use KnownBits::computeForAddSub to compute the known bits at the end since we already have the LHS and RHS.
My larger goal here is to move the code that turns add into xor if only 1 bit is demanded and no bits below it are non-zero from InstCombiner::OptAndOp to here. This will allow it to be more general instead of just looking for 'add' and 'and' with constant RHS.
Differential Revision: https://reviews.llvm.org/D36486
llvm-svn: 311789
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
SimplifyIndVar may introduce zext instructions to widen arguments of the
loop exit check. They should not prevent us from splitting the loop at
the induction variable, but maybe the check should be more conservative,
e.g. making sure it only extends arguments used by a comparison?
Reviewers: karthikthecool, mcrosier, mzolotukhin
Reviewed By: mcrosier
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34879
llvm-svn: 311783
|
| |
|
|
|
|
|
|
| |
Since the lambda isn't escaped (via a std::function or similar) it's
fine/better to use default capture-by-ref to provide semantics similar
to language-level nested scopes (if/for/while/etc).
llvm-svn: 311782
|
| |
|
|
| |
llvm-svn: 311781
|
| |
|
|
|
|
|
| |
Commit r297442 introduced mixed CRLF/LF line endings to two files.
Normalize to to LF-only line endings.
llvm-svn: 311774
|
| |
|
|
|
|
|
|
|
|
| |
convert AShr to LShr.
There are cases where AShr have better chance to be optimized than LShr, especially when the demanded bits are not known to be Zero, and also known to be similar to the sign bit.
Differential Revision: https://reviews.llvm.org/D36936
llvm-svn: 311773
|
| |
|
|
|
|
| |
getOpcode on that. NFC
llvm-svn: 311765
|
| |
|
|
|
|
|
|
| |
compares. NFCI
While there use a local variable instead of calling C->getZExtValue() repeatedly.
llvm-svn: 311764
|
| |
|
|
|
|
| |
https://reviews.llvm.org/D37018
llvm-svn: 311763
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
coro.resumes followed by a suspend)
Summary:
Add musttail to any resume instructions that is immediately followed by a
suspend (i.e. ret). We do this even in -O0 to support guaranteed tail call
for symmetrical coroutine control transfer (C++ Coroutines TS extension).
This transformation is done only in the resume part of the coroutine that has
identical signature and calling convention as the coro.resume call.
Reviewers: GorNishanov
Reviewed By: GorNishanov
Subscribers: EricWF, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D37125
llvm-svn: 311751
|
| |
|
|
|
|
|
|
|
| |
to handle other x86 pseudos that carry flags and thus can't be matched
by our ISel patterns with fused memory accesses.
Differential Revision: https://reviews.llvm.org/D37088
llvm-svn: 311749
|
| |
|
|
|
|
|
|
|
|
| |
This extracts the code out of a giant switch in preparation for expanding it to
handle operations other thin `inc` and `dec`. Add a FIXME indicating what's
coming here.
Differential Revision: https://reviews.llvm.org/D37045
llvm-svn: 311748
|
| |
|
|
|
|
| |
This allows us to remove "test" instructions and use the flags from the TBM instructions directly.
llvm-svn: 311747
|
| |
|
|
|
|
|
| |
Add a reference to the PC array in llvm.used so that linkers that
aggressively dead strip (like ld64) don't remove it.
llvm-svn: 311742
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FeatureSlowUAMem32.
The idea was to mark things that are slow on widely available processors
as slow in the generic CPU so that the code generated for that CPU would
be fast across those processors. However, for this feature that doesn't
work out very well at all.
The problem here is that you can very easily enable AVX or AVX2 on top
of this generic CPU. For example, this can happen just by using AVX2
intrinsics from Clang within a region of code guarded by a dynamic CPU
feature test. When you do that, the generated code with SlowUAMem32 set
is ... amazingly slower. The problem is that there really aren't very
good alternatives to the unaligned loads, and so our vector codegen
regresses significantly.
The other issue is that there are plenty of AMD CPUs with AVX1 that
don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2
instead of this special feature. =/
It would be nice to have the target attriute logic be able to
enable/disable more than just one feature at a time and control this in
a more fine grained and useful way, but that doesn't seem easy. Given
that it is only Sandybridge and Ivybridge that set this feature, for now
I'm just backing it out of the generic CPU. That has the additional
advantage of going back to the previous state that people seemed vaguely
happy with.
llvm-svn: 311740
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The comment for this code indicated that it should work similar to our
handling of add lowering above: if we see uses of an instruction other
than flag usage and store usage, it tries to avoid the specialized
X86ISD::* nodes that are designed for flag+op modeling and emits an
explicit test.
Problem is, only the add case actually did this. In all the other cases,
the logic was incomplete and inverted. Any time the value was used by
a store, we bailed on the specialized X86ISD node. All of this appears
to have been historical where we had different logic here. =/
Turns out, we have quite a few patterns designed around these nodes. We
should actually form them. I fixed the code to match what we do for add,
and it has quite a positive effect just within some of our test cases.
The only thing close to a regression I see is using:
notl %r
testl %r, %r
instead of:
xorl -1, %r
But we can add a pattern or something to fold that back out. The
improvements seem more than worth this.
I've also worked with Craig to update the comments to no longer be
actively contradicted by the code. =[ Some of this still remains
a mystery to both Craig and myself, but this seems like a large step in
the direction of consistency and slightly more accurate comments.
Many thanks to Craig for help figuring out this nasty stuff.
Differential Revision: https://reviews.llvm.org/D37096
llvm-svn: 311737
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.
Steps in this patch:
1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
enable this transform. Other targets will probably want that anyway to distinguish scalars
from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.
2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
vector ext is often free.
Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.
Differential Revision: https://reviews.llvm.org/D36840
llvm-svn: 311731
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: mehdi_amini, dexonsmith, dblaikie, davide, chandlerc, davidxl, echristo, efriedma
Reviewed By: dblaikie
Subscribers: rsmith, mgorny, emaste, llvm-commits
Differential Revision: https://reviews.llvm.org/D35043
llvm-svn: 311730
|
| |
|
|
|
|
|
|
| |
Take-2 after fixing bugs in the original patch.
Differential Revsion: http://reviews.llvm.org/D36864
llvm-svn: 311727
|