| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Correct offset calculation in load-store forwarding for big-endian
targets.
Reviewers: rnk, RKSimon, waltl
Subscribers: sdardis, nemanjai, hiraditya, jrtc27, atanasyan, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D53147
llvm-svn: 344272
|
|
|
|
| |
llvm-svn: 344255
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend analysis forwarding loads from preceeding stores to work with
extended loads and truncated stores to the same address so long as the
load is fully subsumed by the store.
Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are
deleted as they've no longer seem to be relevant.
Reviewers: RKSimon, rnk, kparzysz, javed.absar
Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D49200
llvm-svn: 344142
|
|
|
|
|
|
| |
Help stop bugs like rL343935 by making the 'original' DemandedBits arg more obviously not the mask that is actually used.
llvm-svn: 344138
|
|
|
|
|
|
| |
Part of a minor cleanup to make all the switch statements more consistent prior to improving vector support.
llvm-svn: 344136
|
|
|
|
|
|
|
|
|
|
| |
SimplifyDemandedBits/SimplifyDemandedVectorElts
Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful.
Differential Revision: https://reviews.llvm.org/D53026
llvm-svn: 344132
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already do the following combines:
(bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X
(bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X
When the target has "bit preserving fp logic". This patch just extends it
to also combine:
(bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X)
As some targets have fnabs and even those that don't can efficiently lower
both the fabs and the fneg.
Differential revision: https://reviews.llvm.org/D44548
llvm-svn: 344093
|
|
|
|
|
|
|
|
| |
SimplifyDemandedBits
Fix for AVX1 masked load/store regression on D52964
llvm-svn: 344043
|
|
|
|
| |
llvm-svn: 343997
|
|
|
|
|
|
|
|
|
|
| |
r126518 introduced a a type parameter to the getShiftAmountTy target hook. It
produces the type of the shift (RHSTy), parameterised by the type of the value
being shifted (LHSTy). SelectionDAGBuilder::visitShift passed RHSTy rather
than LHSTy and this patch corrects this. The change is a no-op because in LLVM
IR the LHS and RHS types for a shift must be equal anyway.
llvm-svn: 343955
|
|
|
|
|
|
|
|
| |
ArrayRef instead a pointer to an array. Add assert on size of array. NFC"
The assert is failing some asan tests on the bots.
llvm-svn: 343950
|
|
|
|
|
|
| |
instead a pointer to an array. Add assert on size of array. NFC
llvm-svn: 343948
|
|
|
|
|
|
|
|
|
|
| |
LegalizeVectorOps to LegalizeDAG.
This is where we legalize gather and masked load so this is consistent.
Since these ops are always on vectors I've chosen to go with LegalizeDAG since that's what we do for other vector only ops like BUILD_VECTOR, VECTOR_SHUFFLE, etc. The ScalarizeMaskedMemIntrinsic pass should take care of scalarizing these before SelectionDAG so hopefully we don't need to worry about illegally typed scalar ops being emitted in the legalizing. If we did we would need to do this in LegalizeVectorOps so we could get the second type legalization that runs between LegalizeVectorOps and LegalizeDAG.
llvm-svn: 343947
|
|
|
|
| |
llvm-svn: 343945
|
|
|
|
| |
llvm-svn: 343942
|
|
|
|
|
|
|
|
| |
This change is proposed as a part of D44548, but we
need this independently to avoid regressions from improved
undef propagation in SimplifyDemandedVectorElts().
llvm-svn: 343940
|
|
|
|
| |
llvm-svn: 343939
|
|
|
|
|
|
|
|
|
|
|
|
| |
SimplifyDemandedVectorElts simplification
rL343913 was using SimplifyDemandedBits's original demanded mask instead of the adjusted 'NewMask' that accounts for multiple uses of the op (those variable names really need improving....).
Annoyingly many of the test changes (back to pre-rL343913 state) are actually safe - but only because their multiple uses are all by PMULDQ/PMULUDQ.
Thanks to Jan Vesely (@jvesely) for bisecting the bug.
llvm-svn: 343935
|
|
|
|
|
|
|
|
| |
the result number of the SDValue passed in.
It was always returning the chain which seems to be the result number of the SDValue in the lit tests we have. But I don't know if that's guaranteed.
llvm-svn: 343933
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
simplification
This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector.
There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify.
As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.).
Fixes PR39178
Differential Revision: https://reviews.llvm.org/D52935
llvm-svn: 343913
|
|
|
|
|
|
|
|
|
| |
And use that to transform fsub with zero constant operands.
The integer part isn't used yet, but it is proposed for use in
D44548, so adding both enhancements here makes that
patch simpler.
llvm-svn: 343865
|
|
|
|
|
|
|
|
|
|
| |
LowerLoad. Remove special case code in LegalizeVectorOps that allowed us to only return one result.
Previously we replaced the chain use ourself and return the data result. LegalizeVectorOps then detected that we'd done this and assumed the chain had already been handled.
This commit instead returns a MERGE_VALUES node with two results joined from nodes. This allows LegalizeVectorOps to do all the replacements for us without any special casing. The MERGE_VALUES will be removed by DAG combine.
llvm-svn: 343817
|
|
|
|
| |
llvm-svn: 343750
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mismatching vector types
This fixes a case of bad index calculation when merging mismatching
vector types. This changes the existing code to just use the existing
extract_{subvector|element} and a bitcast (instead of bitcast first and
then newly created extract_xxx) so we don't need to adjust any indices
in the first place.
rdar://44584718
Differential Revision: https://reviews.llvm.org/D52681
llvm-svn: 343493
|
|
|
|
|
|
|
|
| |
The SINT_TO_FP<->UINT_TO_FP combines for non-negative integers should only occur for legal ops once LegalOperations = true
No test case to hand, noticed when investigating PR38226 + PR38970
llvm-svn: 343405
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52661
llvm-svn: 343349
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The convenience wrapper in STLExtras is available since rL342102.
Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb
Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D52573
llvm-svn: 343163
|
|
|
|
|
|
|
|
| |
helper. NFCI.
Handles SrcVT == DstVT as well.
llvm-svn: 343121
|
|
|
|
| |
llvm-svn: 343101
|
|
|
|
|
|
|
|
|
|
|
| |
Adding NonNull as attributes to returned pointers has the unfortunate side
effect of disabling tail calls. This patch ignores the NonNull attribute when
we decide whether to tail merge, in the same way that we ignore the NoAlias
attribute, as it has no affect on the call sequence.
Differential Revision: https://reviews.llvm.org/D52238
llvm-svn: 343091
|
|
|
|
|
|
|
|
|
| |
VerifyDAGDiverence costs compilation time, avoid running it in non-debug
builds.
Differential Revision: https://reviews.llvm.org/D52454
llvm-svn: 343086
|
|
|
|
|
|
|
|
| |
returning a UDIVREM or SDIVREM node.
This shouldn't be possible and is a leftover from when we used to recursively call combine here.
llvm-svn: 343049
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We have `llvm::addLandingPadInfo` and `MachineFunction::addLandingPad`,
both of which add landing pad information to populate `LandingPadInfo`
but are called from different locations, which was confusing. This patch
unifies them with one `MachineFunction::addLandingPad` function, which
now has functionlities of both functions.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52428
llvm-svn: 343018
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the final (I hope!) problem pattern mentioned in PR37749:
https://bugs.llvm.org/show_bug.cgi?id=37749
We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops.
We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like
extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches
that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op.
The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test,
we have this vector-type-legalized sequence:
t29: v8i32 = concat_vectors t27, t28
t30: v4i64 = bitcast t29
t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ...
t31: v4i64 = bitcast t18
t32: v4i64 = xor t30, t31
t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ...
t34: v4i64 = bitcast t9
t35: v4i64 = and t32, t34
t36: v8i32 = bitcast t35
t37: v4i32 = extract_subvector t36, Constant:i64<0>
t38: v4i32 = extract_subvector t36, Constant:i64<4>
Differential Revision: https://reviews.llvm.org/D52318
llvm-svn: 343008
|
|
|
|
|
|
| |
ExpandExtractFromVectorThroughStack. NFCI.
llvm-svn: 342985
|
|
|
|
|
|
|
| |
Reuse search space bookkeeping across multiple predecessor checks
qdone to avoid redundancy. This should cut search cost by ~4x.
llvm-svn: 342984
|
|
|
|
|
|
| |
NFCI.
llvm-svn: 342983
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DAGCombine will try to fold two loads that feed a SELECT or SELECT_CC
after the select, resulting in a select of an address and a single
load after.
If either of the loads depend on the other, this is not legal as it
could introduce cycles. However, it only checked this if the opcode
was a SELECT, and not for a SELECT_CC.
Unfortunately, the only reproducer I have for this is for our
downstream target. I've tried getting it to trigger on an upstream one
but haven't been successful.
Patch thanks to Bevin Hansson.
llvm-svn: 342980
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a preliminary step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
If we have an 'add' instruction that sets flags, we can use that to eliminate an
explicit compare instruction or some other instruction (cmn) that sets flags for
use in the later select.
As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively
reversing an IR icmp canonicalization that replaces a variable operand with a
constant:
https://rise4fun.com/Alive/V1Q
But we're not using 'uaddo' in those cases via DAG transforms. This happens in
CGP after D8889 without checking target lowering to see if the op is supported.
So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with
"using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned
saturated add and converts to uaddo without checking target capabilities.
This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only
see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title
(unlike x86 which sees improvements for all sizes because all sizes are 'custom').
But the AArch code (like x86) looks better when translated to 'uaddo' in all cases.
So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO,
so this patch will fire on those tests.
Another possibility given the existing behavior: we could remove the legal-or-custom
check altogether because we're assuming that a UADDO sequence is canonical/optimal
before we ever reach here. But that seems like a bug to me. If the target doesn't
have an add-with-flags op, then it's not likely that we'll get optimal DAG combining
using a UADDO node. This is similar justification for why we don't canonicalize IR to
the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first
place.
Differential Revision: https://reviews.llvm.org/D51929
llvm-svn: 342886
|
|
|
|
| |
llvm-svn: 342863
|
|
|
|
|
|
| |
This code handled SCALAR_TO_VECTOR being returned by the recursion, but the code that used to return SCALAR_TO_VECTOR was removed in 2015.
llvm-svn: 342856
|
|
|
|
|
|
| |
This comment was misleading about why we were restricting to before legalize types. The reason given would only apply to before legalize ops. But there is a before legalize types reason that should also be listed.
llvm-svn: 342851
|
|
|
|
| |
llvm-svn: 342850
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
negation
This is an alternative to https://reviews.llvm.org/D37896. We can't decompose
multiplies generically without a target hook to tell us when it's profitable.
ARM and AArch64 may be able to remove some existing code that overlaps with
this transform.
This extends D52195 and may resolve PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474
(still an open question about transforming legal vector multiplies, but we
could open another bug report for those)
llvm-svn: 342844
|
|
|
|
| |
llvm-svn: 342826
|
|
|
|
| |
llvm-svn: 342809
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant.
Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code.
No functional change intended.
I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the
helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be
there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps.
Differential Revision: https://reviews.llvm.org/D52285
llvm-svn: 342669
|
|
|
|
|
|
|
| |
The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits,
and the test diff in add.ll is from a DAGCombiner transform.
llvm-svn: 342594
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch adds a GlobalIsel copy utility into MI for flags and updates the instruction emitter for the SDAG path. Some tests show new behavior and I added one for GlobalIsel which mirrors an SDAG test for handling nsw/nuw.
Reviewers: spatel, wristow, arsenm
Reviewed By: arsenm
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D52006
llvm-svn: 342576
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shift/add
This is an alternative to D37896. I don't see a way to decompose multiplies
generically without a target hook to tell us when it's profitable.
ARM and AArch64 may be able to remove some duplicate code that overlaps with
this transform.
As a first step, we're only getting the most clear wins on the vector examples
requested in PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474
As noted in the code comment, it's likely that the x86 constraints are tighter
than necessary, but it may not always be a win to replace a pmullw/pmulld.
Differential Revision: https://reviews.llvm.org/D52195
llvm-svn: 342554
|