| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT
llvm-svn: 282604
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill
behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 282600
|
|
|
|
|
|
|
|
|
|
|
|
| |
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.
Differential Revision: https://reviews.llvm.org/D24625
llvm-svn: 282567
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables. As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified. One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.
This patch considers a limit that a target may set to the number of
destination addresses in a jump table.
Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.
Differential revision: https://reviews.llvm.org/D21940
llvm-svn: 282412
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D23984
llvm-svn: 282381
|
|
|
|
|
|
|
|
|
|
|
|
| |
Correctly use alignment size from loaded size not output value size.
Reviewers: jyknight, tstellarAMD, arsenm
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23356
llvm-svn: 282177
|
|
|
|
|
|
|
|
|
| |
ISel does not handle them correctly yet i.e we crash trying to emit tail call
code.
radar://28407842
llvm-svn: 282088
|
|
|
|
|
|
|
|
| |
rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision.
Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future.
llvm-svn: 281868
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.
This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).
While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.
Differential Revision: https://reviews.llvm.org/D24343
llvm-svn: 281852
|
|
|
|
|
|
|
| |
It is a trivial change which could make the testcase easier to be reused
for the store splitting in CodeGenPrepare.
llvm-svn: 281846
|
|
|
|
|
|
|
| |
analyzeBranch was renamed to use lowercase first, rename
the related set to match.
llvm-svn: 281506
|
|
|
|
|
|
| |
; NFCI
llvm-svn: 281498
|
|
|
|
| |
llvm-svn: 281495
|
|
|
|
| |
llvm-svn: 281493
|
|
|
|
| |
llvm-svn: 281490
|
|
|
|
| |
llvm-svn: 281489
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354.
Reviewers: hfinkel, majnemer, RKSimon
Subscribers: RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D24478
llvm-svn: 281403
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).
Differential Revision: https://reviews.llvm.org/D24491
llvm-svn: 281402
|
|
|
|
|
|
|
|
|
|
| |
test
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.
Followup to D23007
llvm-svn: 281362
|
|
|
|
|
|
|
|
| |
Fix failure to detect out of range shift constants leading to assert in ConstantSDNode::getZExtValue()
Followup to D23007
llvm-svn: 281354
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D23764
llvm-svn: 281308
|
|
|
|
|
|
|
| |
This should make it easier to add cases that we currently don't cover,
like supporting more kinds of type mismatches and more than 2 input vectors.
llvm-svn: 281283
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
|
|
|
|
|
|
|
|
| |
ISel makes assumption about the order of phi nodes.
rdar://28190150
llvm-svn: 281095
|
|
|
|
|
|
| |
Fixes issue with rL280927 identified by Mikael Holmén
llvm-svn: 281042
|
|
|
|
|
|
| |
This bloats codesize - all of the non-leaf nodes are extra code.
llvm-svn: 280932
|
|
|
|
|
|
|
|
|
|
|
|
| |
SimplifyDemandedBits
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.
This is an initial step towards determining the sign bits of a vector (PR29079).
Differential Revision: https://reviews.llvm.org/D24253
llvm-svn: 280927
|
|
|
|
|
|
|
|
| |
Allow AND combines to use a vector splatted constant as well as a constant scalar.
Preliminary part of D24253.
llvm-svn: 280926
|
|
|
|
|
|
|
|
|
|
|
| |
https://llvm.org/bugs/show_bug.cgi?id=29058.
While node legalization we tried to legalize its operands.
If an operand node is replaced during legalization the user node may be destroyed.
Differential Revision: https://reviews.llvm.org/D24244
llvm-svn: 280862
|
|
|
|
|
|
|
|
| |
This reapplies r252565 and r252674, effectively reverting r252956.
This allows VS_32/VS_64 to be unallocatable like they should be.
llvm-svn: 280783
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).
Fixes PR30276.
For convenience, here's the commit message from r246507, which explains the
problem is greater detail:
[DAGCombine] Fixup SETCC legality checking
SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.
The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.
The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:
(setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
(which is possible for some combinations of (op1, op2))
If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.
To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).
Fixes PR24636.
llvm-svn: 280767
|
|
|
|
|
|
|
|
|
|
|
|
| |
), Idx ) -> In
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.
This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.
Differential Revision: https://reviews.llvm.org/D24254
llvm-svn: 280715
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.
Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.
Differential Revision: https://reviews.llvm.org/D22840
llvm-svn: 280505
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2
That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).
The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()
Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.
This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.
Differential Revision: https://reviews.llvm.org/D24154
llvm-svn: 280482
|
|
|
|
|
|
|
|
| |
Reviewers: hans, evandro, sebpop
Differential Revision: https://reviews.llvm.org/D24112
llvm-svn: 280430
|
|
|
|
|
|
|
|
|
|
|
| |
When expanding a SETCC for which the low half is known to evaluate to false,
we can only throw it away for LT/GT comparisons, not LE/GE.
This fixes PR29170.
Differential Revision: https://reviews.llvm.org/D24151
llvm-svn: 280424
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.
Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.
This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
(concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.
(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)
llvm-svn: 280418
|
|
|
|
| |
llvm-svn: 280391
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).
To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).
Differential Revision: https://reviews.llvm.org/D23893
llvm-svn: 280386
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:
ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)
where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.
This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.
Fixes PR26761.
Differential Revision: https://reviews.llvm.org/D24038
llvm-svn: 280350
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.
The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.
Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)
My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.
Differential Revision: https://reviews.llvm.org/D24000
llvm-svn: 280250
|
|
|
|
|
|
| |
Patch by Pranav Bhandarkar.
llvm-svn: 279998
|
|
|
|
|
|
|
|
|
| |
The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist.
In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain).
Differential Revision: http://reviews.llvm.org/D23756
llvm-svn: 279961
|
|
|
|
|
|
|
| |
Right now, this cannot happen, but with the fall back path of GlobalISel
it will show up eventually.
llvm-svn: 279877
|
|
|
|
| |
llvm-svn: 279767
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This greatly simplifies our handling of SDNode::SubclassData.
NFC, hopefully. :)
See discussion in D23035 for discussion about the design API of these
bitfields.
Reviewers: chandlerc
Subscribers: llvm-commits, rnk
Differential Revision: https://reviews.llvm.org/D23036
llvm-svn: 279537
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That commit added a new version of Intrinsic::getName which should only
be called when the intrinsic has no overloaded types. There are several
debugging paths, such as SDNode::dump which are printing the name of the
intrinsic but don't have the overloaded types. These paths should be ok
to just print the name instead of crashing.
The fix here is ultimately to just add a 'None' second argument as that
calls the overload capable getName, which is less efficient, but this is a
debugging path anyway, and not perf critical.
Thanks to Björn Pettersson for pointing out that there were more crashes.
llvm-svn: 279528
|
|
|
|
| |
llvm-svn: 279381
|
|
|
|
|
|
|
|
| |
The heuristic above this code is incredibly suspect, but disregarding that it mutates the cast opcode so we need to check the *mutated* opcode later to see if we need to emit an AssertSext or AssertZext node.
Fixes PR29041.
llvm-svn: 279223
|
|
|
|
|
|
| |
Follow up to r278902. I had missed "fall through", with a space.
llvm-svn: 278970
|