| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if the top level addition in (D + (C-D + x + ...)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the
transformation and better canonicalization of such expressions.
This enables better canonicalization of expressions like
1 + zext(5 + 20 * %x + 24 * %y) and
zext(6 + 20 * %x + 24 * %y)
which get both transformed to
2 + zext(4 + 20 * %x + 24 * %y)
This pattern is common in address arithmetics and the transformation
makes it easier for passes like LoadStoreVectorizer to prove that 2 or
more memory accesses are consecutive and optimize (vectorize) them.
Reviewed By: mzolotukhin
Differential Revision: https://reviews.llvm.org/D48853
llvm-svn: 337859
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SCEV tries to constant-fold arguments of trunc operands in SCEVAddExpr, and when it does
that, it passes wrong flags into the recursion. It is only valid to pass flags that are proved for
narrow type into a computation in wider type if we can prove that trunc instruction doesn't
actually change the value. If it did lose some meaningful bits, we may end up proving wrong
no-wrap flags for sum of arguments of trunc.
In the provided test we end up with `nuw` where it shouldn't be because of this bug.
The solution is to conservatively pass `SCEV::FlagAnyWrap` which is always a valid thing to do.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D49471
llvm-svn: 337435
|
|
|
|
| |
llvm-svn: 337379
|
|
|
|
| |
llvm-svn: 337075
|
|
|
|
|
|
| |
This reverts commit r336140. Our tests shows that LSR assert fails with it.
llvm-svn: 336473
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
to i64), disabling the entire loop versioning pass. Removed the zext and
just use i64.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D48409
llvm-svn: 336140
|
|
|
|
|
|
|
|
|
| |
We can have AddRec with loops having many predecessors.
This changes an assert to an early return.
Differential Revision: https://reviews.llvm.org/D48766
llvm-svn: 335965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338).
I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output.
All LLVM files are already reviewed in D48338.
Reviewers: jdoerfert, bollu, efriedma
Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia
Differential Revision: https://reviews.llvm.org/D48453
llvm-svn: 335292
|
|
|
|
|
|
| |
This reverts commit r335197, as some bots are not happy.
llvm-svn: 335198
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Try to match udiv and urem patterns, and sink zext down to the leaves.
I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D48338
llvm-svn: 335197
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Related to https://bugs.llvm.org/show_bug.cgi?id=37793, https://reviews.llvm.org/D46760#1127287
We'd like to do this canonicalization https://rise4fun.com/Alive/Gmc
But it is currently restricted by rL155136 / rL155362, which says:
```
// This is a constant shift of a constant shift. Be careful about hiding
// shl instructions behind bit masks. They are used to represent multiplies
// by a constant, and it is important that simple arithmetic expressions
// are still recognizable by scalar evolution.
//
// The transforms applied to shl are very similar to the transforms applied
// to mul by constant. We can be more aggressive about optimizing right
// shifts.
//
// Combinations of right and left shifts will still be optimized in
// DAGCombine where scalar evolution no longer applies.
```
I think these tests show that for *constants*, SCEV has no issues with that canonicalization.
Reviewers: mkazantsev, spatel, efriedma, sanjoy
Reviewed By: mkazantsev
Subscribers: sanjoy, javed.absar, llvm-commits, stoklund, bixia
Differential Revision: https://reviews.llvm.org/D48229
llvm-svn: 335101
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts r334428. It incorrectly marks some multiplications as nuw. Tim
Shen is working on a proper fix.
Original commit message:
[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe.
Summary:
Previously we would add them for adds, but not multiplies.
llvm-svn: 335016
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Specifically, we transform
zext(2^K * (trunc X to iN)) to iM ->
2^K * (zext(trunc X to i{N-K}) to iM)<nuw>
This is helpful because pulling the 2^K out of the zext allows further
optimizations.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits, timshen
Differential Revision: https://reviews.llvm.org/D48158
llvm-svn: 334737
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously we would do this simplification only if it did not introduce
any new truncs (excepting new truncs which replace other cast ops).
This change weakens this condition: If the number of truncs stays the
same, but we're able to transform trunc(X + Y) to X + trunc(Y), that's
still simpler, and it may open up additional transformations.
While we're here, also clean up some duplicated code.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48160
llvm-svn: 334736
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IndVarSimplify sometimes makes transforms basing on users that are trivially dead. In particular,
if DCE wasn't run before it, there may be a dead `sext/zext` in loop that will trigger widening
transforms, however it makes no sense to do it.
This patch teaches IndVarsSimplify ignore the mist trivial cases of that.
Differential Revision: https://reviews.llvm.org/D47974
Reviewed By: sanjoy
llvm-svn: 334567
|
|
|
|
| |
llvm-svn: 334434
|
|
|
|
|
|
|
|
|
|
|
|
| |
...)<nuw>.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48041
llvm-svn: 334429
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously we would add them for adds, but not multiplies.
Reviewers: sanjoy
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D48038
llvm-svn: 334428
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: FWIW InstCombine already folds this. Also avoid the case where C1*C2 overflows.
Reviewers: sunfish, sanjoy
Subscribers: hiraditya, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D47965
llvm-svn: 334425
|
|
|
|
|
|
|
|
|
|
|
|
| |
An expression like
(zext i2 {(trunc i32 (1 + %B) to i2),+,1}<%while.body> to i32)
will become zero exactly when the nested value becomes zero in its type.
Strip injective operations from the input value in howFarToZero to make
the value simpler.
Differential Revision: https://reviews.llvm.org/D47951
llvm-svn: 334318
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is
!DILabel(scope: !1, name: "foo", file: !2, line: 3)
We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is
llvm.dbg.label(metadata !1)
It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.
We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.
Differential Revision: https://reviews.llvm.org/D45024
Patch by Hsiangkai Wang.
llvm-svn: 331841
|
|
|
|
|
|
|
|
|
|
|
| |
This patch was temporarily reverted because it has exposed bug 37229 on
PowerPC platform. The bug is unrelated to the patch and was just a general
bug in the optimization done for PowerPC platform only. The bug was fixed
by the patch rL331410.
This patch returns the disabled commit since the bug was fixed.
llvm-svn: 331427
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 023c8be90980e0180766196cba86f81608b35d38.
This patch triggers miscompile of zlib on PowerPC platform. Most likely it is
caused by some pre-backend PPC-specific pass, but we don't clearly know the
reason yet. So we temporally revert this patch with intention to return it
once the problem is resolved. See bug 37229 for details.
llvm-svn: 330893
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current implementation of `computeExitLimit` has a big piece of code
the only purpose of which is to prove that after the execution of this
block the latch will be executed. What it currently checks is actually a
subset of situations where the exiting block dominates latch.
This patch replaces all these checks for simple particular cases with
domination check over loop's latch which is the only necessary condition
of taking the exiting block into consideration. This change allows to
calculate exact loop taken count for simple loops like
for (int i = 0; i < 100; i++) {
if (cond) {...} else {...}
if (i > 50) break;
. . .
}
Differential Revision: https://reviews.llvm.org/D44677
Reviewed By: efriedma
llvm-svn: 329047
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, `getExact` fails if it sees two exit counts in different blocks. There is
no solid reason to do so, given that we only calculate exact non-taken count
for exiting blocks that dominate latch. Using this fact, we can simply take min
out of all exits of all blocks to get the exact taken count.
This patch makes the calculation more optimistic with enforcing our assumption
with asserts. It allows us to calculate exact backedge taken count in trivial loops
like
for (int i = 0; i < 100; i++) {
if (i > 50) break;
. . .
}
Differential Revision: https://reviews.llvm.org/D44676
Reviewed By: fhahn
llvm-svn: 328611
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is re-land of https://reviews.llvm.org/rL327362 with a fix
and regression test.
The crash was due to it is possible that for found MDL loop,
LHS or RHS may contain an invariant unknown SCEV which
does not dominate the MDL. Please see regression
test for an example.
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44553
llvm-svn: 327822
|
|
|
|
|
|
|
|
|
|
| |
The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN`
can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D43810
llvm-svn: 326418
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation of `getPostIncExpr` invokes `getAddExpr` for two recurrencies
and expects that it always returns it a recurrency. But this is not guaranteed to happen if we
have reached max recursion depth or refused to make SCEV simplification for other reasons.
This patch changes its implementation so that now it always returns SCEVAddRec without
relying on `getAddExpr`.
Differential Revision: https://reviews.llvm.org/D42953
llvm-svn: 324866
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
attributes (Step 1)
Summary:
This is a resurrection of work first proposed and discussed in Aug 2015:
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.
This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.
In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
require that the alignments for source & dest be equal.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.
s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g
The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use
getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reviewers: pete, hfinkel, lhames, reames, bollu
Reviewed By: reames
Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits
Differential Revision: https://reviews.llvm.org/D41675
llvm-svn: 322965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is fix for the crash caused by ScalarEvolution::getTruncateExpr.
It expects that if it checked the condition that SCEV is not in UniqueSCEVs cache in
the beginning that it will not be there inside this method.
However during recursion and transformation/simplification for sub expression,
it is possible that these modifications will end up with the same SCEV as we started from.
So we must always check whether SCEV is in cache and do not insert item if it is already there.
Reviewers: sanjoy, mkazantsev, craig.topper
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41380
llvm-svn: 321472
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this method, we invoke `SimplifyICmpOperands` which takes the `Cond` predicate
by reference and may change it along with `LHS` and `RHS` SCEVs. But then we invoke
`computeShiftCompareExitLimit` with Values from which the SCEVs have been derived,
these Values have not been modified while `Cond` could be.
One of possible outcomes of this is that we may falsely prove that an infinite loop ends
within some finite number of iterations.
In this patch, we save the original `Cond` and pass it along with original operands.
This logic may be removed in future once `computeShiftCompareExitLimit` works
with SCEVs instead of value operands.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D40953
llvm-svn: 320142
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively.
When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if
`L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are
consecutive sibling loops within the parent loop. In this case, `AR2` is also varying
w.r.t. `L1`, but we don't correctly identify it.
It can lead, for exaple, to attempt of incorrect folding. Consider:
AR1 = {a,+,b}<L1>
AR2 = {c,+,d}<L2>
EXAR2 = sext(AR1)
MUL = mul AR1, EXAR2
If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to
construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect
because `AR2` is not available on entrance of `L1`.
Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled
with one check: "header of `L1` dominates header of `L2`". This patch replaces the old
insufficient check with this one.
Differential Revision: https://reviews.llvm.org/D39453
llvm-svn: 318819
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: sanjoy, junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 318050
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Max backedge taken count is always expected to be a constant; and this is
usually true by construction -- it is a SCEV expression with constant inputs.
However, if the max backedge expression ends up being computed to be a udiv with
a constant zero denominator[0], SCEV does not fold the result to a constant
since there is no constant it can fold it to (SCEV has no representation for
"infinity" or "undef").
However, in computeMaxBECountForLT we already know the denominator is positive,
and thus at least 1; and we can use this fact to avoid dividing by zero.
[0]: We can end up with a constant zero denominator if the signed range of the
stride is more precise than the unsigned range.
llvm-svn: 316615
|
|
|
|
|
|
|
| |
This reverts commit r316054. There was some confusion over the review process:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171016/495884.html
llvm-svn: 316129
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
Currently scope of evaluation is limited to SCEV computation for
PHI nodes.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 316054
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch teaches SCEV to calculate the maxBECount when the end bound
of the loop can vary. Note that we cannot calculate the exactBECount.
This will only be done when both conditions are satisfied:
1. the loop termination condition is strictly LT.
2. the IV is proven to not overflow.
This provides more information to users of SCEV and can be used to
improve identification of finite loops.
Reviewers: sanjoy, mkazantsev, silviu.baranga, atrick
Reviewed by: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38825
llvm-svn: 315683
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In LLVM IR the following code:
%r = urem <ty> %t, %b
is equivalent to
%q = udiv <ty> %t, %b
%s = mul <ty> nuw %q, %b
%r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t
As UDiv, Mul and Sub are already supported by SCEV, URem can be implemented
with minimal effort using that relation:
%r --> (-%b * (%t /u %b)) + %t
We implement two special cases:
- if %b is 1, the result is always 0
- if %b is a power-of-two, we produce a zext/trunc based expression instead
That is, the following code:
%r = urem i32 %t, 65536
Produces:
%r --> (zext i16 (trunc i32 %a to i16) to i32)
Note that while this helps get a tighter bound on the range analysis and the
known-bits analysis, this exposes some normalization shortcoming of SCEVs:
%div = udim i32 %a, 65536
%mul = mul i32 %div, 65536
%rem = urem i32 %a, 65536
%add = add i32 %mul, %rem
Will usually not be reduced.
llvm-svn: 312329
|
|
|
|
|
|
|
|
|
|
| |
Pushes the sext onto the operands of a Sub if NSW is present.
Also adds support for propagating the nowrap flags of the
llvm.ssub.with.overflow intrinsic during analysis.
Differential Revision: https://reviews.llvm.org/D35256
llvm-svn: 310117
|
|
|
|
|
|
|
|
|
|
| |
The patch rL309080 was reverted because it did not clean up the cache on "forgetValue"
method call. This patch re-enables this change, adds the missing check and introduces
two new unit tests that make sure that the cache is cleaned properly.
Differential Revision: https://reviews.llvm.org/D36087
llvm-svn: 309925
|
|
|
|
|
|
|
|
|
| |
This reverts commit r309080. The patch needs to clear out the
ScalarEvolution::ExitLimits cache in forgetMemoizedResults.
I've replied on the commit thread for the patch with more details.
llvm-svn: 309357
|
|
|
|
|
|
|
|
|
| |
This patch adds a cache for computeExitLimit to save compilation time. A lot of examples of
tests that take extensive time to compile are attached to the bug 33494.
Differential Revision: https://reviews.llvm.org/D35827
llvm-svn: 309080
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When SCEV calculates product of two SCEVAddRecs from the same loop, it
tries to combine them into one big AddRecExpr. If the sizes of the initial
SCEVs were `S1` and `S2`, the size of their product is `S1 + S2 - 1`, and every
operand of the resulting SCEV is combined from operands of initial SCEV and
has much higher complexity than they have.
As result, if we try to calculate something like:
%x1 = {a,+,b}
%x2 = mul i32 %x1, %x1
%x3 = mul i32 %x2, %x1
%x4 = mul i32 %x3, %x2
...
The size of such SCEVs grows as `2^N`, and the arguments
become more and more complex as we go forth. This leads
to long compilation and huge memory consumption.
This patch sets a limit after which we don't try to combine two
`SCEVAddRecExpr`s into one. By default, max allowed size of the
resulting AddRecExpr is set to 16.
Differential Revision: https://reviews.llvm.org/D35664
llvm-svn: 308847
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and indvars"
The patch was reverted due to a bug. The bug was that if the IV is the 2nd operand of the icmp
instruction, then the "Pred" variable gets swapped and differs from the instruction's predicate.
In this patch we use the original predicate to do the transformation.
Also added a test case that exercises this situation.
Differentian Revision: https://reviews.llvm.org/D35107
llvm-svn: 307477
|
|
|
|
|
|
|
|
|
| |
non-negative values and indvars"""
It appears that the problem is still there. Needs more analysis to understand why
SaturatedMultiply test fails.
llvm-svn: 307249
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values and indvars""
It seems that the patch was reverted by mistake. Clang testing showed failure of the
MathExtras.SaturatingMultiply test, however I was unable to reproduce the issue on the
fresh code base and was able to confirm that the transformation introduced by the change
does not happen in the said test. This gives a strong confidence that the actual reason of
the failure of the initial patch was somewhere else, and that problem now seems to be
fixed. Re-submitting the change to confirm that.
llvm-svn: 307244
|
|
|
|
|
|
|
|
|
|
|
| |
indvars"
This patch seems to cause failures of test MathExtras.SaturatingMultiply on
multiple buildbots. Reverting until the reason of that is clarified.
Differential Revision: https://reviews.llvm.org/rL307126
llvm-svn: 307135
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-If there is a IndVar which is known to be non-negative, and there is a value which is also non-negative,
then signed and unsigned comparisons between them produce the same result. Both of those can be
seen in the same loop. To allow other optimizations to simplify them, we turn all instructions like
%c = icmp slt i32 %iv, %b
to
%c = icmp ult i32 %iv, %b
if both %iv and %b are known to be non-negative.
Differential Revision: https://reviews.llvm.org/D34979
llvm-svn: 307126
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In rL300494 there was an attempt to deal with excessive compile time on
invocations of getSign/ZeroExtExpr using local caching. This approach only
helps if we request the same SCEV multiple times throughout recursion. But
in the bug PR33431 we see a case where we request different values all the time,
so caching does not help and the size of the cache grows enormously.
In this patch we remove the local cache for this methods and add the recursion
depth limit instead, as we do for arithmetics. This gives us a guarantee that the
invocation sequence is limited and reasonably short.
Differential Revision: https://reviews.llvm.org/D34273
llvm-svn: 306785
|
|
|
|
|
|
|
| |
Failing test case:
Transforms/LoopVectorize.iv_outside_user.ll
llvm-svn: 306723
|