| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When peeling loops basing on phis becoming invariants, we make a wrong loop size check.
UP.Threshold should be compared against the total numbers of instructions after the transformation,
which is equal to 2 * LoopSize in case of peeling one iteration.
We should also check that the maximum allowed number of peeled iterations is not zero.
Reviewers: sanjoy, anna, reames, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31753
llvm-svn: 300441
|
| |
|
|
| |
llvm-svn: 300439
|
| |
|
|
|
|
| |
vectors. NFC
llvm-svn: 300438
|
| |
|
|
| |
llvm-svn: 300437
|
| |
|
|
|
|
| |
version isn't currently supported. NFC
llvm-svn: 300436
|
| |
|
|
| |
llvm-svn: 300435
|
| |
|
|
|
|
| |
into udiv. NFC
llvm-svn: 300434
|
| |
|
|
|
|
|
|
|
| |
This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
to instCombineCall pass and was written specific for X86 CMP intrinsics.
Differential Revision: https://reviews.llvm.org/D31398
llvm-svn: 300422
|
| |
|
|
|
|
| |
vector constants
llvm-svn: 300402
|
| |
|
|
| |
llvm-svn: 300401
|
| |
|
|
|
|
|
|
|
|
| |
...when C1 differs from C2 by one bit and C1 <u C2:
http://rise4fun.com/Alive/Vuo
And move related folds to a helper function. This reduces code duplication and
will make it easier to remove the scalar-only restriction as a follow-up step.
llvm-svn: 300364
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We currently only support folding a subtract into a select but not a PHI. This fixes that.
I had to fix an assumption in FoldOpIntoPhi that assumed the PHI node was always in operand 0. Now we pass it in like we do for FoldOpIntoSelect. But we still require some dancing to find the Constant when we create the BinOp or ConstantExpr. This is based code is similar to what we do for selects.
Since I touched all call sites, this also renames FoldOpIntoPhi to foldOpIntoPhi to match coding standards.
Differential Revision: https://reviews.llvm.org/D31686
llvm-svn: 300363
|
| |
|
|
| |
llvm-svn: 300360
|
| |
|
|
| |
llvm-svn: 300357
|
| |
|
|
| |
llvm-svn: 300351
|
| |
|
|
|
|
|
|
| |
have >1 value, unless we can prove the phi node is cycle free.
Fixes PR 32607.
llvm-svn: 300299
|
| |
|
|
| |
llvm-svn: 300253
|
| |
|
|
|
|
|
|
|
|
|
| |
Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This
avoids the (expensive) APInt division operation in favour of bit operations.
Remove all memory allocation from within the GCD loop by tweaking our `lshr`
implementation so it can operate in-place.
Differential Revision: https://reviews.llvm.org/D31968
llvm-svn: 300252
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Bug noticed by inspection.
Extend the test to handle invokes as well as calls, and rewrite it to
not depend on the inliner and other passes.
Also simplify the call site replacement code with CallSite, similar to
what I did to dead arg elimination and arg promotion (rL300235 and
rL300229).
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32041
llvm-svn: 300251
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
callsite_location+callee_name
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.
Reviewers: davidxl, dnovillo
Reviewed By: davidxl
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31950
llvm-svn: 300240
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.
Reviewers: mssimpso, mkuper, anemet
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31979
llvm-svn: 300238
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is effectively a retry of:
https://reviews.llvm.org/rL299851
but now we have tests and an assert to make sure the bug
that was exposed with that attempt will not happen again.
I'll fix the code duplication and missing sibling fold next,
but I want to make this change as small as possible to reduce
risk since I messed it up last time.
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32524
llvm-svn: 300236
|
| |
|
|
|
|
|
|
|
|
| |
Noticed by inspection while doing attribute work. DAE, InstCombineCalls,
and ArgPromotion have a fair amount of duplicated code for hacking on
call sites, and you can find bugs by comparing them.
Add a test case for this.
llvm-svn: 300229
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
* Add a bitreverse case in the demanded bits analysis pass.
* Add tests for the bitreverse (and bswap) intrinsic in the
demanded bits pass.
* Add a test case to the BDCE tests: that manipulations to
high-order bits are eliminated once the bits are reversed
and then right-shifted.
Reviewers: mkuper, jmolloy, hfinkel, trentxintong
Reviewed By: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31857
llvm-svn: 300215
|
| |
|
|
|
|
|
| |
If we had these tests, the bug caused by https://reviews.llvm.org/rL299851 would have been caught sooner.
There's also an assert in the code that should have caught that bug, but the assert line itself has a bug.
llvm-svn: 300201
|
| |
|
|
|
|
| |
This reverts commit r296872 now that PR32153 has been fixed.
llvm-svn: 300200
|
| |
|
|
| |
llvm-svn: 300161
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32486
...the canonicalization of vector select to shufflevector does not hold up
when undef elements are present in the condition vector.
Try to make the undef handling clear in the code and the LangRef.
Differential Revision: https://reviews.llvm.org/D31980
llvm-svn: 300092
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
instruction that has multiple uses, if we know all the bits for the demanded bits for this context we can go ahead and create a constant.
Currently if we reach an instruction with multiples uses we know we can't do any optimizations to that instruction itself since we only have the demanded bits for one of the users. But if we know all of the bits are zero/one for that one user we can still go ahead and create a constant to give to that user.
This might then reduce the instruction to having a single use and allow additional optimizations on the other path.
This picks up an additional case that r300075 didn't catch.
Differential Revision: https://reviews.llvm.org/D31552
llvm-svn: 300084
|
| |
|
|
| |
llvm-svn: 300081
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
below the highest demanded bit can be simplified
If we are adding/subtractings 0s below the highest demanded bit we can just use the other operand and remove the operation.
My primary motivation is observing that we can call ShrinkDemandedConstant for the add/sub and create a 0 constant, rather than removing the add completely. In the case I saw, we modified the constant on an add instruction to a 0, but the add is not put into the worklist. So we didn't revisit it until the next InstCombine iteration. This caused an IR modification to remove add and a subsequent iteration to be ran.
With this change we get bypass the add in the first iteration and prevent the second iteration from changing anything.
Differential Revision: https://reviews.llvm.org/D31120
llvm-svn: 300075
|
| |
|
|
|
|
|
|
|
|
|
|
| |
One potential way to make InstCombine (very slightly?) faster is to recycle instructions
when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't
consider this an "InstSimplify". We could, however, make a new layer to house transforms
like this if that makes InstCombine more manageable (just throwing out an idea; not sure
how much opportunity is actually here).
Differential Revision: https://reviews.llvm.org/D31863
llvm-svn: 300067
|
| |
|
|
|
|
|
|
|
|
|
| |
Use '2>&1 |' and not '|&' to pipe debug output to FileCheck
Hopefully handles a "shell parser error" on
llvm-clang-x86_64-expensive-checks-win
test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll
llvm-svn: 300064
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In getEntryCost(), make the scalar type for a compare instruction that of the
operands, not i1. This is needed in order to call getCmpSelInstrCost() for a
compare in a sensible way, the same way as the LoopVectorizer does.
New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll
Review: Matthew Simpson
https://reviews.llvm.org/D31601
llvm-svn: 300061
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cost for a branch after vectorization is very different depending on if
the vectorizer will if-convert the block (branch is eliminated), or if
scalarized and predicated blocks will be produced (branch duplicated before
each block). There is also the case of remaining scalar branches, such as the
back-edge branch.
This patch handles these cases differently with TTI based cost estimates.
Review: Matthew Simpson
https://reviews.llvm.org/D31175
llvm-svn: 300058
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.
This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.
The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.
New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll
Review: Adam Nemet
https://reviews.llvm.org/D30680
llvm-svn: 300056
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Dead basic blocks may be forming a loop, for which SSA form is
fulfilled, but with a circular def-use chain. LoadCombine could
enter an infinite loop when analysing such dead code. This patch
solves the problem by simply avoiding to analyse all basic blocks
that aren't forward reachable, from function entry, in LoadCombine.
Fixes https://bugs.llvm.org/show_bug.cgi?id=27065
Reviewers: mehdi_amini, chandlerc, grosser, Bigcheese, davide
Reviewed By: davide
Subscribers: dberlin, zzheng, bjope, grandinj, Ka-Ka, materi, jholewinski, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31032
llvm-svn: 300034
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
COFF requires that every comdat contain a symbol with the same name as
the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this
requirement to be violated. This change avoids such violations by
renaming comdats if their leaders are renamed. It also keeps comdats
together when splitting modules.
Reviewers: pcc, mehdi_amini, tejohnson
Reviewed By: pcc
Subscribers: rnk, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31963
llvm-svn: 300019
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fold:
shuffle (splat-shuffle), undef, M --> splat-shuffle
Reviewers: spatel, RKSimon, craig.topper
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31527
llvm-svn: 299990
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the vectorization of first order recurrence, we vectorize such
that the last element in the vector will be the one extracted to pass into the
scalar remainder loop. However, this is not true when there is a phi (other
than the primary induction variable) is used outside the loop.
In such a case, we need the value from the second last iteration (i.e.
the phi value), not the last iteration (which would be the phi update).
I've added a test case for this. Also see PR32396.
A follow up patch would generate the correct code gen for such cases,
and turn this vectorization on.
Differential Revision: https://reviews.llvm.org/D31910
Reviewers: mssimpso
llvm-svn: 299985
|
| |
|
|
| |
llvm-svn: 299984
|
| |
|
|
|
|
|
|
| |
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.
llvm-svn: 299980
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, pass AddDiscriminators always avoided to assign
discriminators to intrinsic calls. This was done mainly for two reasons:
1) We wanted to minimize the number of based discriminators used.
2) We wanted to avoid non-deterministic discriminator assignment for
different debug levels.
Unfortunately, that approach was problematic for MemIntrinsic calls.
MemIntrinsic calls can be split by SROA into loads and stores, and each new
load/store instruction would obtain the debug location from the original
intrinsic call.
If we don't assign a discriminator to MemIntrinsic calls, then we cannot
correctly set the discriminator for the newly created loads and stores.
This may have a negative impact on the basic block weight computation
performed by the SampleLoader.
This patch fixes the issue by letting MemIntrinsic calls have a discriminator.
Differential Revision: https://reviews.llvm.org/D31900
llvm-svn: 299972
|
| |
|
|
| |
llvm-svn: 299971
|
| |
|
|
| |
llvm-svn: 299969
|
| |
|
|
|
|
| |
This is a candidate culprit for multiple bot fails, so reverting pending investigation.
llvm-svn: 299955
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In rL299692 I improved strip-dead-debug-info's ability to drop CUs that are not
referenced from the current module. However, in doing so I neglected to realize
that some SPs could be referenced entirely from inlined functions. It appears
I was not the only one to make this mistake, because DebugInfoFinder, doesn't
find those SPs either. Fix this in DebugInfoFinder and then use that to make
sure not to drop those CUs in strip-dead-debug-info.
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31904
llvm-svn: 299936
|
| |
|
|
| |
llvm-svn: 299915
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(h/t to Chandler for pointing this out)
The test in question was not at all testing what it was supposed to
test. We do not //care// about placing `!make.implicit` in inner
constant branch (since it will be folded away anyway). We care about
placing `!make.implicit` in the outer branch that switches between
either version of the loop.
Having said that, it is _correct_ to leave behind the `!make.implicit`
in the inner branch, but there is no need to do so.
llvm-svn: 299912
|