| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The replaceAllDbgUsesWith utility helps passes preserve debug info when
replacing one value with another.
This improves upon the existing insertReplacementDbgValues API by:
- Updating debug intrinsics in-place, while preventing use-before-def of
the replacement value.
- Falling back to salvageDebugInfo when a replacement can't be made.
- Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into
common utility code.
Along with the API change, this teaches replaceAllDbgUsesWith how to
create DIExpressions for three basic integer and pointer conversions:
- The no-op conversion. Applies when the values have the same width, or
have bit-for-bit compatible pointer representations.
- Truncation. Applies when the new value is wider than the old one.
- Zero/sign extension. Applies when the new value is narrower than the
old one.
Testing:
- check-llvm, check-clang, a stage2 `-g -O3` build of clang,
regression/unit testing.
- This resolves a number of mis-sized dbg.value diagnostics from
Debugify.
Differential Revision: https://reviews.llvm.org/D48676
llvm-svn: 336451
|
| |
|
|
|
|
|
|
| |
As discussed in D48987 and D48893, there are many different
ways to go wrong depending on the binop (and as shown here
we already do go wrong in some cases).
llvm-svn: 336450
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we only iterated over functions reachable from the set of
external functions in the module. But since some of the passes under
this (notably the always-inliner and coroutine lowerer) are required for
correctness, they need to run over everything.
This just adds an extra layer of iteration over the CallGraph to keep
track of which functions we've already visited and get the next batch of
SCCs.
Should fix PR38029.
llvm-svn: 336419
|
| |
|
|
|
|
| |
are done"
llvm-svn: 336410
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Better NaN handling for AMDGCN fmed3.
All operands are checked for NaN now. The checks
were moved before the canonicalization to provide
a better mapping from fclamp. Changed the behaviour
of fmed3(x,y,NaN) to return max(x,y) instead of
min(x,y) in light of this. Updated tests as a result
and added some new cases to cover the fix.
Patch by Alan Baker
llvm-svn: 336375
|
| |
|
|
|
|
| |
independent FMA and extractelement/insertelement.
llvm-svn: 336315
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have bailout hacks based on min/max in various places in instcombine
that shouldn't be necessary. The affected test was added for:
D48930
...which is a consequence of the improvement in:
D48584 (https://reviews.llvm.org/rL336172)
I'm assuming the visitTrunc bailout in this patch was added specifically
to avoid a change from SimplifyDemandedBits, so I'm just moving that
below the EvaluateInDifferentType optimization. A narrow min/max is still
a min/max.
llvm-svn: 336293
|
| |
|
|
|
|
| |
That makes it easier to mix and match lines into other tests.
llvm-svn: 336289
|
| |
|
|
| |
llvm-svn: 336268
|
| |
|
|
| |
llvm-svn: 336258
|
| |
|
|
|
|
|
|
|
| |
When creating `phi` instructions to resume at the scalar part of the loop,
copy the DebugLoc from the original phi over to the new one.
Differential Revision: https://reviews.llvm.org/D48769
llvm-svn: 336256
|
| |
|
|
|
|
|
|
|
|
|
| |
When zext is EvaluatedInDifferentType, InstCombine
drops the dbg.value intrinsic. This patch tries to
preserve said DI, by inserting the zext's old DI in the
resulting instruction. (Only for integer type for now)
Differential Revision: https://reviews.llvm.org/D48331
llvm-svn: 336254
|
| |
|
|
|
|
| |
This adds coverage for a planned enhancement for ConstantExpr::getBinOpIdentity() noted in D48830.
llvm-svn: 336220
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the test diffs show, the current users of getBinOpIdentity()
are InstCombine and Reassociate. SLP vectorizer is a candidate
for using this functionality too (D28907).
The InstCombine shuffle improvements are part of the planned
enhancements noted in D48830.
InstCombine actually has several other uses of getBinOpIdentity()
via SimplifyUsingDistributiveLaws(), but we don't call that for
any FP ops. Fixing that might be another part of removing the
custom reassociation in InstCombine that is only done for fadd+fmul.
llvm-svn: 336215
|
| |
|
|
| |
llvm-svn: 336214
|
| |
|
|
| |
llvm-svn: 336211
|
| |
|
|
| |
llvm-svn: 336208
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the last significant change suggested in PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806#c5
...though there are several follow-ups noted in the code comments
in this patch to complete this transform.
It's possible that a binop feeding a select-shuffle has been eliminated
by earlier transforms (or the code was just written like this in the 1st
place), so we'll fail to match the patterns that have 2 binops from:
D48401,
D48678,
D48662,
D48485.
In that case, we can try to materialize identity constants for the remaining
binop to fill in the "ghost" lanes of the vector (where we just want to pass
through the original values of the source operand).
I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups.
For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor).
Differential Revision: https://reviews.llvm.org/D48830
llvm-svn: 336196
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When salvaging a dbg.declare/dbg.addr we should not add
DW_OP_stack_value to the DIExpression
(see test/Transforms/InstCombine/salvage-dbg-declare.ll).
Consider this example
%vla = alloca i32, i64 2
call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression())
Instcombine will turn it into
%vla1 = alloca [2 x i32]
%vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0
call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression())
If the GEP can be eliminated, then the dbg.declare will be salvaged
and we should get
%vla1 = alloca [2 x i32]
call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression())
The problem was that salvageDebugInfo did not recognize dbg.declare
as being indirect (%vla1 points to the value, it does not hold the
value), so we incorrectly got
call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value))
I also made sure that llvm::salvageDebugInfo and
DIExpression::prependOpcodes do not add DW_OP_stack_value to
the DIExpression in case no new operands are added to the
DIExpression. That way we avoid to, unneccessarily, turn a
register location expression into an implicit location expression
in some situations (see test11 in test/Transforms/LICM/sinking.ll).
Reviewers: aprantl, vsk
Reviewed By: aprantl, vsk
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48837
llvm-svn: 336191
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
unswitching loops.
Original patch trying to address this was sent in D47624, but that
didn't quite handle things correctly. There are two key principles used
to select whether and how to invalidate SCEV-cached information about
loops:
1) We must invalidate any info SCEV has cached before unswitching as we
may change (or destroy) the loop structure by the act of unswitching,
and make it hard to recover everything we want to invalidate within
SCEV.
2) We need to invalidate all of the loops whose CFGs are mutated by the
unswitching. Notably, this isn't the *entire* loop nest, this is
every loop contained by the outermost loop reached by an exit block
relevant to the unswitch.
And we need to do this even when doing trivial unswitching.
I've added more focused tests that directly check that SCEV starts off
with imprecise information and after unswitching (and simplifying
instructions) re-querying SCEV will produce precise information. These
tests also specifically work to check that an *outer* loop's information
becomes precise.
However, the testing here is still a bit imperfect. Crafting test cases
that reliably fail to be analyzed by SCEV before unswitching and succeed
afterward proved ... very, very hard. It took me several hours and
careful work to build these, and I'm not optimistic about necessarily
coming up with more to cover more elaborate possibilities. Fortunately,
the code pattern we are testing here in the pass is really
straightforward and reliable.
Thanks to Max Kazantsev for the initial work on this as well as the
review, and to Hal Finkel for helping me talk through approaches to test
this stuff even if it didn't come to much.
Differential Revision: https://reviews.llvm.org/D47624
llvm-svn: 336183
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch changes order of transform in InstCombineCompares to avoid
performing transforms based on ranges which produce complex bit arithmetics
before more simple things (like folding with constants) are done. See PR37636
for the motivating example.
Differential Revision: https://reviews.llvm.org/D48584
Reviewed By: spatel, lebedev.ri
llvm-svn: 336172
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
to i64), disabling the entire loop versioning pass. Removed the zext and
just use i64.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D48409
llvm-svn: 336140
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.
%1 = extractelement <2 x i32> %a, i32 0
%2 = extractelement <2 x i32> %a, i32 1
%cond = icmp sgt i32 %1, %2
%3 = extractelement <2 x i32> %a, i32 0
%4 = extractelement <2 x i32> %a, i32 1
%select = select i1 %cond, i32 %3, i32 %4
Author: FarhanaAleen
Reviewed By: ABataev, RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D47608
llvm-svn: 336130
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
folding
This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)
Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving
to where other passes could use that functionality.
Differential Revision: https://reviews.llvm.org/D48662
llvm-svn: 336128
|
| |
|
|
|
|
| |
Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls
llvm-svn: 336125
|
| |
|
|
| |
llvm-svn: 336111
|
| |
|
|
|
|
|
|
|
| |
Due to current limitations in constant analysis, we need flags
on add or mul to show propagation for the potential transform
suggested in these tests (no other binops currently report
identity constants).
llvm-svn: 336101
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.
Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.
Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.
Reviewers: dberlin, mssimpso, davide, efriedma
Reviewed By: mssimpso
Differential Revision: https://reviews.llvm.org/D43762
llvm-svn: 336098
|
| |
|
|
|
|
| |
This is another pattern mentioned in PR37806.
llvm-svn: 336096
|
| |
|
|
|
|
|
|
|
|
| |
handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.
This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...
llvm-svn: 336095
|
| |
|
|
| |
llvm-svn: 336078
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
|
| |
|
|
| |
llvm-svn: 336065
|
| |
|
|
|
|
| |
This is similar to the m_Not change from D44076.
llvm-svn: 336064
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder Loop
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 336062
|
| |
|
|
|
|
| |
Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations
llvm-svn: 336060
|
| |
|
|
|
|
| |
Recommit of r335324 after buildbot failure fix
llvm-svn: 336059
|
| |
|
|
| |
llvm-svn: 336050
|
| |
|
|
|
|
|
| |
The mul+shl tests add coverage for the fold enabled with D48678.
The and+or tests are not handled yet; that's D48662.
llvm-svn: 335984
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR37806)
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have 2 different variable values, then we shuffle (select) those lanes,
shuffle (select) the constants, and then perform the binop. This eliminates a binop.
The new shuffle uses the same shuffle mask as the existing shuffle, so there's no
danger of creating a difficult shuffle.
All of the earlier constraints still apply, but we also check for extra uses to
avoid creating more instructions than we'll remove.
Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.
Differential Revision: https://reviews.llvm.org/D48678
llvm-svn: 335974
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].
The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
with an type assertion in cast, because `%dec` is no longer an `Instruction`,
If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.
So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.
Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi
Reviewed By: mkazantsev
Subscribers: evstupac, javed.absar, spatel, llvm-commits
Differential Revision: https://reviews.llvm.org/D48599
llvm-svn: 335950
|
| |
|
|
|
|
| |
Use xor for the extra uses test because div/rem have other problems.
llvm-svn: 335924
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The InlinerFunctionImportStats will collect and dump stats regarding how
many function inlined into the module were imported by ThinLTO.
Reviewers: wmi, dexonsmith
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D48729
llvm-svn: 335914
|
| |
|
|
|
|
|
|
|
| |
When rewriting an alloca partition copy the DL from the
old alloca over the the new one.
Differential Revision: https://reviews.llvm.org/D48640
llvm-svn: 335904
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806
We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other
direction because that's generally better of course). This allows us to remove the shuffle
as we do in the regular opcodes-are-the-same cases.
This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv
Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There
are planned enhancements for opcode transforms such or -> add.
Note that there's a different fold needed if we've already managed to simplify away a binop
as seen in the test based on PR37806, but we manage to get that one case here because this
fold is positioned above the demanded elements fold currently.
Differential Revision: https://reviews.llvm.org/D48485
llvm-svn: 335888
|
| |
|
|
|
|
|
|
|
|
|
|
| |
SCCP does not change the CFG, so we can mark it as preserved.
Reviewers: dberlin, efriedma, davide
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D47149
llvm-svn: 335820
|
| |
|
|
|
|
|
| |
If a trunc has a user in a block which is not reachable from entry,
we can safely perform trunc elimination as if this user didn't exist.
llvm-svn: 335816
|
| |
|
|
| |
llvm-svn: 335778
|
| |
|
|
| |
llvm-svn: 335756
|
| |
|
|
|
|
|
|
| |
that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".
llvm-svn: 335744
|