| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Cloning basic blocks in the loop for runtime loop unroller depends on loop being
in rotated form (i.e. loop latch target is the exit block).
Assert that this is true, so that callers of runtime loop unroller pass in
canonical loops.
The single caller of this function has that check recently added:
https://reviews.llvm.org/rL301239
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32801
llvm-svn: 302058
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, loop deletion deletes loop where the only values
that are used outside the loop are loop-invariant.
This patch adds logic to delete loops where the loop is proven to be
never executed (i.e. the only predecessor of the loop preheader has a
constant conditional branch as terminator, and the preheader is not the
taken target). This will remove loops that become dead after
loop-unswitching generates constant conditional branches.
The next steps are:
1. moving the loop deletion implementation to LoopUtils.
2. Add logic in loop-simplifyCFG which will support changing conditional
constant branches to unconditional branches. If loops become unreachable in this
process, they can be removed using `deleteDeadLoop` function.
Reviewers: chandlerc, efriedma, sanjoy, reames
Reviewed by: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32494
llvm-svn: 302015
|
|
|
|
|
|
| |
No change in which intrinsics should be speculated.
llvm-svn: 301995
|
|
|
|
|
|
|
|
|
|
| |
AttributeList"
This time, I fixed, built, and tested clang.
This reverts r301712.
llvm-svn: 301981
|
|
|
|
| |
llvm-svn: 301974
|
|
|
|
|
|
|
| |
This is a follow up to the previous
inline cost patch for quicker filtering.
llvm-svn: 301959
|
|
|
|
|
|
|
| |
Just let TTI's cost do this instead of arbitrarily restricting
this.
llvm-svn: 301950
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was originally checked in here:
https://reviews.llvm.org/rL301923
And reverted here:
https://reviews.llvm.org/rL301924
Because there's a clang test that would fail after this. I fixed/removed the
offending CHECK lines in:
https://reviews.llvm.org/rL301928
So let's try this again. Original commit message:
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in https://reviews.llvm.org/D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate https://reviews.llvm.org/D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is harmful to SCEV and codegen because 'not' can often be folded while random xor cannot.
2. It does not transform vector constants. This is actually a good thing, but if you don't believe the above argument, then we shouldn't have excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't match the greedy nature of instcombine. If we DeMorganize a pattern that has an extra 'not' in it: ~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold that pattern too: (~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301929
|
|
|
|
|
|
| |
There's a clang test that is wrongly using -O1 and failing after this commit.
llvm-svn: 301924
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is
harmful to SCEV and codegen because 'not' can often be folded while
random xor cannot.
2. It does not transform vector constants. This is actually a good thing,
but if you don't believe the above argument, then we shouldn't have
excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't
match the greedy nature of instcombine. If we DeMorganize a pattern
that has an extra 'not' in it:
~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold
that pattern too:
(~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301923
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D32666
llvm-svn: 301894
|
|
|
|
| |
llvm-svn: 301878
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the testcase attached, we believe %tmp1 implies %tmp4.
where:
br i1 %tmp1, label %bb2, label %bb7
br i1 %tmp4, label %bb5, label %bb7
because Wwhile looking at PredicateInfo stuffs we end up calling
isImpliedTrueByMatchingCmp() with the arguments backwards.
Differential Revision: https://reviews.llvm.org/D32718
llvm-svn: 301849
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have ~(~X & Y), it only makes sense to transform it to (X | ~Y) when we do not need
the intermediate (~X & Y) value. In that case, we would need an extra instruction to
generate ~Y + 'or' (as shown in the test changes).
It's ok if we have multiple uses of ~X or Y, however. In those cases, we may not reduce the
instruction count or critical path, but we might improve throughput because we can generate
~X and ~Y in parallel. Whether that actually makes perf sense or not for a target is something
we can't answer in IR.
Differential Revision: https://reviews.llvm.org/D32703
llvm-svn: 301848
|
|
|
|
| |
llvm-svn: 301835
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D32195
llvm-svn: 301832
|
|
|
|
|
|
|
|
| |
We may not be able to rewrite indirect branch target, but we also want to take it into
account when folding, i.e. if it and all its successor's predecessors go to the same
destination, we can fold, i.e. no need to thread.
llvm-svn: 301816
|
|
|
|
|
|
| |
This relands r301424.
llvm-svn: 301812
|
|
|
|
| |
llvm-svn: 301808
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: [JumpThread] Do RAUW in case Cond folds to a constant in the CFG
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32407
llvm-svn: 301804
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
programUndefinedIfPoison makes more sense, given what the function
does; and I'm about to add a function with a name similar to
isKnownNotFullPoison (so do the rename to avoid confusion).
Reviewers: broune, majnemer, bjarke.roune
Reviewed By: broune
Subscribers: mcrosier, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30444
llvm-svn: 301776
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
negative/nonnegative number and add methods for changing the negative/nonnegative state
Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit.
Reviewers: RKSimon, spatel, davide
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32651
llvm-svn: 301747
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
retainAutoreleasedReturnValue that retains the returned value.
This commit fixes a bug in ARC optimizer where it moves a release
between a call and a retainAutoreleasedReturnValue, causing the returned
object to be released before the retainAutoreleasedReturnValue can
retain it.
This commit accomplishes that by doing a lookahead and checking whether
the call prevents the release from moving upwards. In the long term, we
should treat the region between the retainAutoreleasedReturnValue and
the call as a critical section and disallow moving anything there
(possibly using operand bundles).
rdar://problem/20449878
llvm-svn: 301724
|
|
|
|
|
|
|
|
|
| |
I fixed my miscompile in r301722 and I hope I don't have to take
a look at this code again now that Chandler has a new LoopUnswitch
pass, but maybe this could be of use for somebody else in the
meanwhile.
llvm-svn: 301723
|
|
|
|
|
|
|
|
| |
This fixes PR32818.
Differential Revision: https://reviews.llvm.org/D32664
llvm-svn: 301722
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AttributeList"
This broke the Clang build. (Clang-side patch missing?)
Original commit message:
> [IR] Make add/remove Attributes use AttrBuilder instead of
> AttributeList
>
> This change cleans up call sites and avoids creating temporary
> AttributeList objects.
>
> NFC
llvm-svn: 301712
|
|
|
|
|
|
|
| |
These are pretty common when using local memory, and the 64-bit generic
addressing is much more expensive to compute.
llvm-svn: 301711
|
|
|
|
|
|
|
|
|
|
|
| |
While looking at pure addressing expressions, it's possible
for the value to appear later in Postorder.
I haven't been able to come up with a testcase where this
exhibits an actual issue, but if you insert a dump before
the value map lookup, a few testcases crash.
llvm-svn: 301705
|
|
|
|
|
|
|
|
| |
Eliminates some more cases where some subset of the addressing
computation remains flat. Some cases with addrspacecasts
in nested constant expressions are still left behind however.
llvm-svn: 301704
|
|
|
|
| |
llvm-svn: 301702
|
|
|
|
|
|
|
|
|
| |
This change cleans up call sites and avoids creating temporary
AttributeList objects.
NFC
llvm-svn: 301697
|
|
|
|
|
|
|
|
|
| |
While debugging a miscompile I realized loopunswitch doesn't
put newlines when printing the instruction being replacement.
Ending up with a single line with many instruction replaced isn't
the best for readability and/or mental sanity.
llvm-svn: 301692
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The method is called "get *Param* Alignment", and is only used for
return values exactly once, so it should take argument indices, not
attribute indices.
Avoids confusing code like:
IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError);
Alignment = CS->getParamAlignment(ArgIdx + 1);
Add getRetAlignment to handle the one case in Value.cpp that wants the
return value alignment.
This is a potentially breaking change for out-of-tree backends that do
their own call lowering.
llvm-svn: 301682
|
|
|
|
| |
llvm-svn: 301673
|
|
|
|
| |
llvm-svn: 301672
|
|
|
|
|
|
|
| |
Avoids use of AttributeList::getNumSlots, making it easier to change the
underlying implementation.
llvm-svn: 301671
|
|
|
|
|
|
|
|
|
|
|
| |
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.
NFC
llvm-svn: 301666
|
|
|
|
| |
llvm-svn: 301662
|
|
|
|
| |
llvm-svn: 301656
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Skip memops if the total value profiled count is 0, we can't correctly
scale up the counts and there is no point anyway.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32624
llvm-svn: 301645
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow up to the fix in r298360 to improve the handling of debug
values when redundant LEAs are removed. The fix in r298360 effectively
discarded the debug values. This patch now attempts to preserve the debug
values by using the DWARF DW_OP_stack_value operation via prependDIExpr.
Moved functions appendOffset and prependDIExpr from Local.cpp to
DebugInfoMetadata.cpp and made them available as static member functions of
DIExpression.
Differential Revision: https://reviews.llvm.org/D31604
llvm-svn: 301630
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EarlyCSE should not just ignore assumes. It should use the fact that its condition is true for all dominated instructions.
Reviewers: sanjoy, reames, apilipenko, anna, skatkov
Reviewed By: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32482
llvm-svn: 301625
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a condition is calculated only once, and there are multiple guards on this condition, we should be able
to remove all guards dominated by the first of them. This patch allows EarlyCSE to try to find the condition
of a guard among the known values, and if it is true, remove the guard. Otherwise we keep the guard and
mark its condition as 'true' for future consideration.
Reviewers: sanjoy, reames, apilipenko, skatkov, anna, dberlin
Reviewed By: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32476
llvm-svn: 301623
|
|
|
|
| |
llvm-svn: 301612
|
|
|
|
|
|
|
| |
Use a SmallSetSetVector instead of a SmallPtrSet as iterating
over the latter is not stable ('<' relies on addresses).
llvm-svn: 301599
|
|
|
|
|
|
|
| |
Matching any random value would be very wrong:
https://bugs.llvm.org/show_bug.cgi?id=32830
llvm-svn: 301594
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
This is a second re-land of r298158. This time, this feature is
limited to -fdata-sections builds.
llvm-svn: 301587
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When possible, put ASan ctor/dtor in comdat.
The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.
The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.
This is a second re-land of r298756. This time with a flag to disable
the whole thing to avoid a bug in the gold linker:
https://sourceware.org/bugzilla/show_bug.cgi?id=19002
llvm-svn: 301586
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
|
|
|
|
|
|
|
|
|
| |
Just calling dropAllReferences leaves pointers to the ConstantExpr
behind, so we would eventually crash with a null pointer dereference.
Differential Revision: https://reviews.llvm.org/D32551
llvm-svn: 301575
|