| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296476
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar
Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30046
llvm-svn: 296474
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lower i32, float and double parameters that need to live on the stack. This
boils down to creating some G_GEPs starting from the stack pointer and storing
the values there. During the process we also keep track of the stack size and
use the final value in the ADJCALLSTACKDOWN/UP instructions.
We currently assert for smaller types, since they usually require extensions.
They will be handled in a separate patch.
llvm-svn: 296473
|
|
|
|
|
|
| |
Put it into a register by means of a MOVi.
llvm-svn: 296471
|
|
|
|
|
|
|
| |
Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register
operand.
llvm-svn: 296469
|
|
|
|
| |
llvm-svn: 296468
|
|
|
|
| |
llvm-svn: 296464
|
|
|
|
|
|
|
|
| |
registered as permanent libraries."
It broke clang/test/Analysis/checker-plugins.c
llvm-svn: 296463
|
|
|
|
|
|
| |
At this point, G_GEP is just an add, so we treat it exactly like a G_ADD.
llvm-svn: 296462
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Thumb2, instructions which write to the PC are UNPREDICTABLE if they are in
an IT block but not the last instruction in the block.
Previously, we only diagnosed this for LDM instructions, this patch extends the
diagnostic to cover all of the relevant instructions.
Differential Revision: https://reviews.llvm.org/D30398
llvm-svn: 296459
|
|
|
|
|
|
| |
This should be the same as the mapping for G_ADD etc.
llvm-svn: 296455
|
|
|
|
|
|
|
| |
At the moment we're only interested in GEPs for putting call parameters on the
stack, so we'll stick to 32-bit offsets.
llvm-svn: 296452
|
|
|
|
| |
llvm-svn: 296447
|
|
|
|
| |
llvm-svn: 296443
|
|
|
|
|
|
|
|
|
|
| |
This is also useful in cases when llvm is in a shared library. First we dlopen
the llvm shared library and then we register it as a permanent library in order
to keep the JIT and other services working.
Patch reviewed by Vedant Kumar (D29955)!
llvm-svn: 296442
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With this change ImplicitNullCheck optimization uses alias analysis
and can use load/store memory access for implicit null check if there
are other load/store before but memory accesses do not alias.
Patch by Serguei Katkov!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30331
llvm-svn: 296440
|
|
|
|
| |
llvm-svn: 296438
|
|
|
|
| |
llvm-svn: 296432
|
|
|
|
|
|
|
|
| |
Revert Machine Outliner for now, as it breaks the asan bot.
This reverts commit r296418.
llvm-svn: 296426
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html
The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.
This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.
The outliner is run like so:
clang -mno-red-zone -mllvm -enable-machine-outliner file.c
Patch by Jessica Paquette<jpaquette@apple.com>!
rdar://29166825
Differential Revision: https://reviews.llvm.org/D26872
llvm-svn: 296418
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
llvm-svn: 296416
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before the endianness was specified on each call to read
or write of the StreamReader / StreamWriter, but in practice
it's extremely rare for streams to have data encoded in
multiple different endiannesses, so we should optimize for the
99% use case.
This makes the code cleaner and more general, but otherwise
has NFC.
llvm-svn: 296415
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 296413
|
|
|
|
|
|
|
| |
We may get a VL where the first element is a load, but the others
aren't. Trying to sort such VLs can only lead to sorrow.
llvm-svn: 296411
|
|
|
|
|
|
| |
This fixes -filetype=null errors introduced in r296403.
llvm-svn: 296410
|
|
|
|
|
|
|
| |
Constant fold, canonicalize constants to RHS,
reduce to minnum/maxnum when inputs are nan/undef.
llvm-svn: 296409
|
|
|
|
| |
llvm-svn: 296407
|
|
|
|
|
|
|
|
|
|
| |
The Fuchsia ASan runtime reserves the low part of the address space.
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D30426
llvm-svn: 296405
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 296404
|
|
|
|
|
|
|
|
|
| |
Instead of requiring every non-COFF MCObjectStreamer to implement the
COFF hooks just to do an llvm_unreachable to say that they're not
supported, do the llvm_unreachable in the default implementation, as
suggested by rnk in https://reviews.llvm.org/D26722.
llvm-svn: 296403
|
|
|
|
| |
llvm-svn: 296402
|
|
|
|
| |
llvm-svn: 296401
|
|
|
|
|
|
|
| |
CFG sorting was already an independent algorithm from block/loop insertion;
this change makes it more convenient to debug.
llvm-svn: 296399
|
|
|
|
|
|
|
|
| |
attributes"
It causes miscompiles e.g. during self-host of Clang (PR32082).
llvm-svn: 296398
|
|
|
|
| |
llvm-svn: 296396
|
|
|
|
|
|
|
|
|
|
| |
createSwiftErrorEntriesInEntryBlock
Otherwise, it will insert instructions before it.
rdar://30536186
llvm-svn: 296395
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was reverted because it was breaking some builds, and
because of incorrect error code usage. Since the CL was
large and contained many different things, I'm resubmitting
it in pieces.
This portion is NFC, and consists of:
1) Renaming classes to follow a consistent naming convention.
2) Fixing the const-ness of the interface methods.
3) Adding detailed doxygen comments.
4) Fixing a few instances of passing `const BinaryStream& X`. These
are now passed as `BinaryStreamRef X`.
llvm-svn: 296394
|
|
|
|
|
|
|
| |
This reverts r295782. This could potentially result in some
legalization loops and I avoided the need for this.
llvm-svn: 296393
|
|
|
|
| |
llvm-svn: 296392
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Should use the Valuekind read from the profile.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits, xur
Differential Revision: https://reviews.llvm.org/D30420
llvm-svn: 296391
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of the condition
The transform in question claims to be doing:
// fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c))
...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node
for the sext/zext patterns.
This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(),
so I was seeing infinite loops with my draft of a patch applied.
The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll
is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's
not affected by this code change.
Differential Revision:
https://reviews.llvm.org/D30355
llvm-svn: 296389
|
|
|
|
| |
llvm-svn: 296382
|
|
|
|
|
|
|
|
|
|
| |
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.
This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.
Differential Revision: https://reviews.llvm.org/D30176
llvm-svn: 296381
|
|
|
|
|
|
|
| |
Add packed types as legal so they may be used with inlineasm.
Keep all operations expanded for now.
llvm-svn: 296379
|
|
|
|
|
|
|
| |
Doesn't fix any practical problems because clamp/omod
are currently folded after peephole optimizer.
llvm-svn: 296375
|
|
|
|
| |
llvm-svn: 296372
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Existing implementation of duplicateSimpleBB function drops DebugLoc metadata of branch instructions during the transformation. This patch addresses this issue by making newly created branch instructions to keep the metadata of replaced branch instructions.
Reviewers: qcolombet, craig.topper, aprantl, MatzeB, sanjoy, dblaikie
Reviewed By: dblaikie
Subscribers: dblaikie, llvm-commits
Differential Revision: https://reviews.llvm.org/D30026
llvm-svn: 296371
|
|
|
|
|
|
| |
Mostly useful for writing tests for f16 features.
llvm-svn: 296370
|
|
|
|
|
|
|
|
| |
Add a few non-VOP3P but instructions related to packed.
Includes hack with dummy operands for the benefit of the assembler
llvm-svn: 296368
|
|
|
|
|
|
|
|
|
|
|
| |
This was suggested in D27855: have the inliner add assumptions, so we don't
lose nonnull info provided by argument attributes.
This still doesn't solve PR28430 (dyn_cast), but this gets us closer.
https://reviews.llvm.org/D29999
llvm-svn: 296366
|