| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recommitting rL319773, which was reverted due to a recursive issue
causing timeouts. This happened because I failed to check whether
the discovered loads could be narrowed further. In the case of a tree
with one or more narrow loads, that could not be further narrowed, as
well as a node that would need masking, an AND could be introduced
which could then be visited and recombined again with the same load.
This could again create the masking load, with would be combined
again... We now check that the load can be narrowed so that this
process stops.
Original commit message:
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Differential Revision: https://reviews.llvm.org/D41177
llvm-svn: 320679
|
|
|
|
|
|
| |
We should be able to support ANY_EXTEND for any types we support ZERO_EXTEND for.
llvm-svn: 320675
|
|
|
|
|
|
|
|
|
|
| |
for AVX512F.
A v32i1 CONCAT_VECTORS of v16i1 uses promotion to v32i8 to legalize the v32i1. This results in a bunch of extract_vector_elts and a build_vector that ultimately gets scalarized.
This patch checks to see if v16i8 is legal and inserts a any_extend to that so that we can concat v16i8 to v32i8 and avoid creating the extracts.
llvm-svn: 320674
|
|
|
|
|
|
| |
These calls already exist earlier under AVX2 feature.
llvm-svn: 320673
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose
update chain involves casts; PSCEV can now build an AddRecurrence for some
forms of such phi nodes, under the proper runtime overflow test. This means
that we can identify such phi nodes as an induction, and the loop-vectorizer
can now vectorize such inductions, however inefficiently. The vectorizer
doesn't know that it can ignore the casts, and so it vectorizes them.
This patch records the casts in the InductionDescriptor, so that they could
be marked to be ignored for cost calculation (we use VecValuesToIgnore for
that) and ignored for vectorization/widening/scalarization (i.e. treated as
TriviallyDead).
In addition to marking all these casts to be ignored, we also need to make
sure that each cast is mapped to the right vector value in the vector loop body
(be it a widened, vectorized, or scalarized induction). So whenever an
induction phi is mapped to a vector value (during vectorization/widening/
scalarization), we also map the respective cast instruction (if exists) to that
vector value. (If the phi-update sequence of an induction involves more than one
cast, then the above mapping to vector value is relevant only for the last cast
of the sequence as we allow only the "last cast" to be used outside the
induction update chain itself).
This is the last step in addressing PR30654.
llvm-svn: 320672
|
|
|
|
|
|
|
|
|
|
| |
account whether the input type also needs to be promoted.
If so go ahead and get the promoted input vector to extract from. Previously, we would create a bunch of any_extends of extract_vector_elts with illegal input type that needs to be promoted. The legalization of those extract_vector_elts would then potentially introduce a truncate. So now we have a bunch of any_extends of truncates. By legalizing both parts together we avoid creating these extra nodes.
The test changes seem to be because we were previously combining the build_vector with the any_extend before the any_extend got combined with the truncate.
llvm-svn: 320669
|
|
|
|
|
|
|
|
|
|
| |
Factor out duplicated code emitting mach-o version-min specifiers.
This should be NFC but happens to fix a bug where the code in
MCMachoStreamer didn't take the version skew between darwin and macos
versions into account.
llvm-svn: 320666
|
|
|
|
|
|
|
|
| |
LC_BUILD_VERSION is a new load command superseding the previously used
LC_XXX_MIN_VERSION commands. This adds an assembler directive along with
encoding/streaming support.
llvm-svn: 320661
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the number of scheduler entries needed for FMA3 instructions."
I've hopefully sidestepped the MSVC issue that caused it to be reverted. We no longer include the Sched enum from X86GenInstrInfo.inc on the X86 target. So hopefully MSVC's preprocessor will skip over it and nothing will notice the 11000 character enum name.
Original commit message:
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
llvm-svn: 320655
|
|
|
|
|
|
|
|
| |
Extends https://reviews.llvm.org/rL320640
Differential Revision: https://reviews.llvm.org/D41136
llvm-svn: 320653
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two issues were found about machine inst scheduler when compiling ProRender
with -g for amdgcn target:
GCNScheduleDAGMILive::schedule tries to update LiveIntervals for DBG_VALUE, which it
should not since DBG_VALUE is not mapped in LiveIntervals.
when DBG_VALUE is the last instruction of MBB, ScheduleDAGInstrs::buildSchedGraph and
ScheduleDAGMILive::scheduleMI does not move RPTracker properly, which causes assertion.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D41132
llvm-svn: 320650
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently this is an LLVM extension to the COFF spec which is
experimental and intended to speed up linking. For now it is
behind a hidden cl::opt flag, but in the future we can move it
to a "real" cc1 flag and have the driver pass it through whenever
it is appropriate.
The patch to actually make use of this section in lld will come
in a followup.
Differential Revision: https://reviews.llvm.org/D40917
llvm-svn: 320649
|
|
|
|
| |
llvm-svn: 320648
|
|
|
|
| |
llvm-svn: 320645
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D41202
llvm-svn: 320642
|
|
|
|
|
|
|
| |
Stage 2 bootstrap failed:
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/14434
llvm-svn: 320641
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR35642)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=35642
...we can have different forms of min/max, so we should recognize those here in EarlyCSE
similar to how we already handle binops and compares that can commute.
Differential Revision: https://reviews.llvm.org/D41136
llvm-svn: 320640
|
|
|
|
| |
llvm-svn: 320636
|
|
|
|
| |
llvm-svn: 320635
|
|
|
|
| |
llvm-svn: 320634
|
|
|
|
| |
llvm-svn: 320633
|
|
|
|
| |
llvm-svn: 320629
|
|
|
|
| |
llvm-svn: 320628
|
|
|
|
| |
llvm-svn: 320627
|
|
|
|
| |
llvm-svn: 320626
|
|
|
|
| |
llvm-svn: 320625
|
|
|
|
| |
llvm-svn: 320624
|
|
|
|
| |
llvm-svn: 320623
|
|
|
|
| |
llvm-svn: 320622
|
|
|
|
| |
llvm-svn: 320621
|
|
|
|
| |
llvm-svn: 320620
|
|
|
|
| |
llvm-svn: 320619
|
|
|
|
| |
llvm-svn: 320618
|
|
|
|
| |
llvm-svn: 320617
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stores failed to decode at all since they didn't have a
DecoderNamespace set. Loads worked, but did not change
the register width displayed to match the numbmer of
enabled channels.
The number of printed registers for vaddr is still wrong,
but I don't think that's encoded in the instruction so
there's not much we can do about that.
Image atomics are still broken. MIMG is the same
encoding for SI/VI, but the image atomic classes
are split up into encoding specific versions unlike
every other MIMG instruction. They have isAsmParserOnly
set on them for some reason. dmask is also special for
these, so we probably should not have it as an explicit
operand as it is now.
llvm-svn: 320614
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perfom the
preversation was minimally altered and was simply marked
as preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements. One example is loop boundary threading.
Reviewers: dberlin, kuhar, sebpop
Reviewed By: kuhar, sebpop
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40146
llvm-svn: 320612
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
w.r.t. the paper
"A Practical Improvement to the Partial Redundancy Elimination in SSA Form"
(https://sites.google.com/site/jongsoopark/home/ssapre.pdf)
Proper dominance check was missing here, so having a loopinfo should not be required.
Committing this diff as this fixes the bug, if there are
further concerns, I'll be happy to work on them.
Differential Revision: https://reviews.llvm.org/D39781
llvm-svn: 320607
|
|
|
|
|
|
|
|
|
|
| |
Shrink wrapping should ignore DBG_VALUEs referring to frame indices,
since the presence of debug information must not affect code
generation.
Differential Revision: https://reviews.llvm.org/D41187
llvm-svn: 320606
|
|
|
|
| |
llvm-svn: 320589
|
|
|
|
|
|
| |
llvm-clang-x86_64-expensive-checks-win.
llvm-svn: 320588
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The initial implementation of an MI SSA pass to reduce cr-logical operations.
Currently, the only operations handled by the pass are binary operations where
both CR-inputs come from the same block and the single use is a conditional
branch (also in the same block).
Committing this off by default to allow for a period of field testing. Will
enable it by default in a follow-up patch soon.
Differential Revision: https://reviews.llvm.org/D30431
llvm-svn: 320584
|
|
|
|
|
|
| |
Add missing RDTSCP itinerary
llvm-svn: 320581
|
|
|
|
|
|
|
|
| |
Unfortunately these aren't defined explicitly in the privileged spec, but the
GNU assembler does accept `sfence.vma` and `sfence.vma rs` as well as the
usual `sfence.vma rs, rt`.
llvm-svn: 320575
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D41109
llvm-svn: 320573
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D41112
llvm-svn: 320571
|
|
|
|
|
|
|
|
|
|
| |
Pass the input vector through SimplifyDemandedBits as we only need the sign bit from each vector element of MOVMSK
We'd probably get more hits if SimplifyDemandedBits was better at handling vectors...
Differential Revision: https://reviews.llvm.org/D41119
llvm-svn: 320570
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds the assembler aliases for the floating point instructions
which can be mapped to a single canonical instruction. The missing
pseudo instructions (flw, fld, fsw, fsd) are marked as TODO. Other
things, like for example PCREL_LO, have to be implemented first.
This patch builds upon D40902.
Differential Revision: https://reviews.llvm.org/D41071
Patch by Mario Werner.
llvm-svn: 320569
|
|
|
|
|
|
| |
OpenGL issues should be fixed by now.
llvm-svn: 320568
|
|
|
|
|
|
|
|
| |
Add missing case that was not implemented yet.
Differential Revision: https://reviews.llvm.org/D38942
llvm-svn: 320567
|
|
|
|
|
|
|
|
|
|
| |
debug output
Work towards the unification of MIR and debug output by printing `%jump-table.0` instead of `<jt#0>`.
Only debug syntax is affected.
llvm-svn: 320566
|