| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: PR44149
Reviewers: jdoerfert
Subscribers: mehdi_amini, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70737
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm not sure what the effect of this change will be on all of the affected
tests or a larger benchmark, but it fixes the horizontal add/sub problems
noted here:
https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc
The costs are based on reciprocal throughput numbers in Agner's tables for
PEXTR*; these appear to be very slow ops on Silvermont.
This is a small step towards the larger motivation discussed in PR43605:
https://bugs.llvm.org/show_bug.cgi?id=43605
Also, it seems likely that insert/extract is the source of perf regressions on
other CPUs (up to 30%) that were cited as part of the reason to revert D59710,
so maybe we'll extend the table-based approach to other subtargets.
Differential Revision: https://reviews.llvm.org/D70607
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pass at the O1 described there.""
This reapplies: 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4
Original commit message:
As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.
Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.
Original post:
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
This reverts commit c9ddb02659e3ece7a0d9d6b4dac7ceea4ae46e6d.
|
|
|
|
|
|
|
| |
This is correct for any value including NaN/inf.
We don't have this fold directly in the backend either,
but x86 manages to get it after converting things to bitops.
|
| |
|
|
|
|
| |
InstCombine doesn't have any transforms for copysign currently.
|
|
|
|
|
|
|
| |
Pulling a couple of extra tests out of
D64258
before abandoning in favor of
D70714
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
return memccpy(d, "helloworld", 'r', 20)
=>
return memcpy(d, "helloworld", 8 /* pos of 'r' in string */), d + 8
Reviewers: efriedma, jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68089
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch enables us to track GEP instruction in align deduction.
If a pointer `B` is defined as `A+Offset` and known to have alignment `C`, there exists some integer Q such that
```
A + Offset = C * Q = B
```
So we can say that the maximum power of two which is a divisor of gcd(Offset, C) is an alignment.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70392
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the O1 described there."
This reverts commit 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4.
This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu
Following tests were failing:
lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py
lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py
lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py
lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py
lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py
lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py
lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py
lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py
lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
described there.
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.
Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.
Original post:
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
order recurrences."
This version contains 2 fixes for reported issues:
1. Make sure we do not try to sink terminator instructions.
2. Make sure we bail out, if we try to sink an instruction that needs to
stay in place for another recurrence.
Original message:
If the recurrence PHI node has a single user, we can sink any
instruction without side effects, given that all users are dominated by
the instruction computing the incoming value of the next iteration
('Previous'). We can sink instructions that may cause traps, because
that only causes the trap to occur later, but not on any new paths.
With the relaxed check, we also have to make sure that we do not have a
direct cycle (meaning PHI user == 'Previous), which indicates a
reduction relation, which potentially gets missed by
ReductionDescriptor.
As follow-ups, we can also sink stores, iff they do not alias with
other instructions we move them across and we could also support sinking
chains of instructions and multiple users of the PHI.
Fixes PR43398.
Reviewers: hsaito, dcaballe, Ayal, rengolin
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D69228
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the assertion in updateSuccessor is overly strict in some
cases and overly relaxed in other cases. For branches to the inner and
outer loop preheader it is too strict, because they can either be
unconditional branches or conditional branches with duplicate targets.
Both cases are fine and we can allow updating multiple successors.
On the other hand, we have to at least update one successor. This patch
adds such an assertion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that
pattern. That preserves some of the old optimizations in IR.
Given a shuffle that includes undef elements in an otherwise identity mask like:
define <4 x float> @shuffle(<4 x float> %arg) {
%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>
ret <4 x float> %shuf
}
We were simplifying that to the input operand.
But as discussed in PR43958:
https://bugs.llvm.org/show_bug.cgi?id=43958
...that means that per-vector-element poison that would be stopped by the shuffle can now
leak to the result.
Also note that we still have (and there are tests for) the same transform with no undef
elements in the mask (a fully-defined identity mask). I don't think there's any
controversy about that case - it's a valid transform under any interpretation of
shufflevector/undef/poison.
Looking at a few of the diffs into codegen, I don't see any difference in final asm. So
depending on your perspective, that's good (no real loss of optimization power) or bad
(poison exists in the DAG, so we only partially fixed the bug).
Differential Revision: https://reviews.llvm.org/D70246
|
|
|
|
|
|
|
| |
This reverts commit 854e956219e78cb8d7ef3b021d7be6b5d6b6af04.
It broke tests:
Transforms/Inline/redundant-loads.ll
Transforms/SampleProfile/inline-callee-update.ll
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently every time we encounter an indirect call of a known function,
we try to evaluate the inline cost of that function. In case of a
recursion, that evaluation never stops.
The solution presented is to evaluate only the indirect call of the
function, while any further indirect calls (of a known function) will be
treated just as direct function calls, which, actually, never tries to
evaluate the call.
Fixes PR35469.
Differential Revision: https://reviews.llvm.org/D69349
|
|
|
|
|
|
| |
Patch by Chris Ye!
Differential Revision: https://reviews.llvm.org/D68004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`.
This patch allows `findBuildAggregate()` to consider vector aggregates as
well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`.
Fixes vector part of llvm.org/PR42022
Reviewers: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70068
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Without this patch, the jump threading pass ignores profiling data
whenever we invoke the pass with the new pass manager.
Specifically, JumpThreadingPass::run calls runImpl with class variable
HasProfileData always set to false. In turn, runImpl sets
HasProfileData to false again:
HasProfileData = HasProfileData_;
In the end, we don't use profiling data at all with the new pass
manager.
This patch fixes the problem by passing F.hasProfileData() to runImpl.
The bug appears to have been introduced at:
https://reviews.llvm.org/D41461
which removed local variable HasProfileData in JumpThreadingPass::run
even though there was one more use left in the same function. As a
result, the remaining use ended referring to the class variable
instead.
Note that F.hasProfileData is an extremely lightweight function, so I
don't see the need to cache its result. Once this patch is approved,
I'm planning to stop caching the result of F.hasProfileData in
runOnFunction.
Reviewers: wmi, eli.friedman
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70509
|
|
|
|
| |
We may end up with a case where we have a widenable branch above the loop, but not all widenable branches within the loop have been removed. Since a widenable branch inhibit SCEVs ability to reason about exit counts (by design), we have a tradeoff between effectiveness of this optimization and allowing future widening of the branches within the loop. LoopPred is thought to be one of the most important optimizations for range check elimination, so let's pay the cost.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bit-Tracking Dead Code Elimination (bdce) do not mark dbg.value as undef after
deleting instruction. which shows invalid state of variable in debugger. This
patches fixes this by marking the dbg.value as undef which depends on dead
instruction.
This fixes https://bugs.llvm.org/show_bug.cgi?id=41925
Patch by kamlesh kumar!
Differential Revision: https://reviews.llvm.org/D70040
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a reminder, a "widenable branch" is the pattern "br i1 (and i1 X, WC()), label %taken, label %untaken" where "WC" is the widenable condition intrinsics. The semantics of such a branch (derived from the semantics of WC) is that a new condition can be added into the condition arbitrarily without violating legality.
Broaden the definition in two ways:
Allow swapped operands to the br (and X, WC()) form
Allow widenable branch w/trivial condition (i.e. true) which takes form of br i1 WC()
The former is just general robustness (e.g. for X = non-instruction this is what instcombine produces). The later is specifically important as partial unswitching of a widenable range check produces exactly this form above the loop.
Differential Revision: https://reviews.llvm.org/D70502
|
|
|
|
|
|
|
|
| |
Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when
option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not
to predicate the loop.
Differential Revision: https://reviews.llvm.org/D70382
|
| |
|
|
|
|
|
|
|
|
|
|
| |
testcases.
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.
This reverts commit 8a0aa5310bccbb42d16d11db090419fcefdd1376.
|
|
|
|
|
|
|
|
| |
patterns""
as there were testcase changes after that need to also be reverted.
This reverts commit cd8748a15f2d18861b3548eb26ed2b52e5ee50b4.
|
|
|
|
|
|
|
|
| |
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.
This reverts commit 7ff57705ba196ce649d6034614b3b9df57e1f84f.
|
| |
|
| |
|
|
|
|
| |
This code has never been enabled. While it is tested, it's complicating some refactoring. If we decide to re-implement this, doing it in SimplifyCFG would probably make more sense anyways.
|
|
|
|
| |
Unswitch (and other loop transforms) like to generate loop exit blocks with unconditional successors, and phi nodes (LCSSA, or simple multiple exiting blocks sharing an exit). Generalize the "likely very rare exit" check slightly to handle this form.
|
|
|
|
| |
detect more NoNaNs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pair up inlined AutoreleaseRV calls with their matching RetainRV or
ClaimRV.
- RetainRV cancels out AutoreleaseRV. Delete both instructions.
- ClaimRV is a peephole for RetainRV+Release. Delete AutoreleaseRV and
replace ClaimRV with Release.
This avoids problems where more aggressive inlining triggers memory
regressions.
This patch is happy to skip over non-callable instructions and non-ARC
intrinsics looking for the pair. It is likely sound to also skip over
opaque function calls, but that's harder to reason about, and it's not
relevant to the goal here: if there's an opaque function call splitting
up a pair, it's very unlikely that a handshake would have happened
dynamically without inlining.
Note that this patch also subsumes the previous logic that looked
backwards from ReleaseRV.
https://reviews.llvm.org/D70370
rdar://problem/46509586
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 1st attempt was reverted because it revealed an existing
bug where we could produce invalid IR (use of value before
definition). That should be fixed with:
rG39de82ecc9c2
The bug manifests as replacing a reduction operand with an undef
value.
The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.
In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.
For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.
Differential Revision: https://reviews.llvm.org/D70148
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.
The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.
We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.
Differential Revision: https://reviews.llvm.org/D69392
|
| |
|
|
|
|
|
|
| |
As discussed in D70148 (and caused a revert of the original commit):
if we insert at the select, then we can produce invalid IR because
the replacement for the compare may have uses before the select.
|
|
|
|
| |
See D70148 for discussion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Most of IR instructions got better code size estimations after commit 47a5c36b.
So default parameters values should be updated to improve inlining and
unrolling for the target.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70391
|
|
|
|
|
|
|
|
| |
uses (PR43948)"
as it causes an ICE on valid. A testcase was followed up on the original thread.
This reverts commit a3e61946c5bd7bdfab15af76b292e52d6ffa27f7.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements a version of the predicateLoopExits transform from IndVarSimplify extended to exploit widenable conditions - and thus be much wider in scope of legality. The code structure ends up being almost entirely different, so I chose to duplicate this into the LoopPredication pass instead of trying to reuse the code in the IndVars.
The core notions of the transform are as follows:
If we have a widenable condition which controls entry into the loop, we're allowed to widen it arbitrarily. Given that, it's simply a *profitability* question as to what conditions to fold into the widenable branch.
To avoid pass ordering issues, we want to avoid widening cases that would otherwise be dischargeable. Or... widen in a form which can still be discharged. Thus, we phrase the transform as selecting one analyzeable exit from the set of analyzeable exits to keep. This avoids creating pass ordering complexities.
Since none of the above proves that we actually exit through our analyzeable exits - we might exit through something else entirely - we limit ourselves to cases where a) the latch is analyzeable and b) the latch is predicted taken, and c) the exit being removed is statically cold.
Differential Revision: https://reviews.llvm.org/D69830
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.
mve_pred16_t pred = vcmpeqq(v1, v2);
v_out = vaddq_m(v_inactive, v3, v4, pred);
then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic
will call `@llvm.arm.mve.pred.i2v` to convert it straight back again.
This will be visible in the generated code as a `vmrs`/`vmsr` pair
that move the predicate value pointlessly out of `p0` and back into it again.
To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:
vpt.u16 eq, q1, q2
vaddt.u16 q0, q3, q4
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70313
|
|
|
|
|
| |
The binary operator cast implies an instruction, but the matcher for shift does not:
https://bugs.llvm.org/show_bug.cgi?id=44028
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we miss folds with undef and identity values for binary ops
that do not fold to undef in general.
We can generalize the identity simplifications and do them before
checking for undef in particular.
Alive checks:
* OR - https://rise4fun.com/Alive/8OsK
* AND - https://rise4fun.com/Alive/e3tE
This will also allow us to remove some now redundant cases throughout
the function, but I would like to do this as follow-up. That should make
tracking down potential issues easier.
Reviewers: spatel, RKSimon, lebedev.ri
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D70169
|
|
|
|
|
|
|
|
| |
Reviewers: jdoerfert, uenoku
Subscribers:
Differential Revision: https://reviews.llvm.org/D70140
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to/extension of D70208 (rGee0882bdf866), but this one
may finally allow closing motivating bugs.
This is another step towards having FMF apply only to FP values
rather than those + fcmp. See PR38086 for one of the original
discussions/motivations:
https://bugs.llvm.org/show_bug.cgi?id=38086
And the test here is derived from PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535
Currently, we lose FMF when converting any phi to select in
SimplifyCFG. There are a small number of similar changes needed
to correct within SimplifyCFG, so it should be quick to patch
this pass up.
FMF was extended to select and phi with:
D61917
D67564
|