| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
In ConstructSSAForLoadSet if an available value is actually the load that we're
doing SSA construction to eliminate, then we can omit it as SSAUpdate will add
in the value for the phi that will be replacing it anyway. This can result in
simpler IR which can allow further optimisation.
Differential Revision: https://reviews.llvm.org/D44160
llvm-svn: 337686
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Bug fix for PR37445. The underlying problem and its fix are similar to PR37808.
The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is
computed before the call to tryRemoveTrivialPhi() and it ends up being out of
date, pointing to stale data. We have now turned each of the PhiOps into a
TrackingVH<MemoryAccess>.
Differential Revision: https://reviews.llvm.org/D49425
llvm-svn: 337680
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Bug fix for PR36787. When reasoning if it's safe to hoist a load we
want to make sure that the defining memory access dominates the new
insertion point of the hoisted instruction. safeToHoistLdSt calls
firstInBB(InsertionPoint,DefiningAccess) which returns false if
InsertionPoint == DefiningAccess, and therefore it falsely thinks
it's safe to hoist.
Differential Revision: https://reviews.llvm.org/D49555
llvm-svn: 337674
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49382
llvm-svn: 337642
|
| |
|
|
|
|
|
|
| |
This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)
llvm-svn: 337606
|
| |
|
|
| |
llvm-svn: 337549
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.
Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.
Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.
Reviewers: dberlin, mssimpso, davide, efriedma
Reviewed By: mssimpso
Differential Revision: https://reviews.llvm.org/D43762
llvm-svn: 337548
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49423
llvm-svn: 337545
|
| |
|
|
|
|
| |
negatived.
llvm-svn: 337543
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r337489.
It causes asserts to fire in some TensorFlow tests, e.g.
tensorflow/compiler/tests/gather_test.py on GPU.
Example stack trace:
Start test case: GatherTest.testHigherRank
assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits"
@ 0x5559446ebe10 __assert_fail
@ 0x55593ef32f5e llvm::APInt::trunc()
@ 0x55593d78f86e (anonymous namespace)::Vectorizer::lookThroughComplexAddresses()
@ 0x55593d78f2bc (anonymous namespace)::Vectorizer::areConsecutivePointers()
@ 0x55593d78d128 (anonymous namespace)::Vectorizer::isConsecutiveAccess()
@ 0x55593d78c926 (anonymous namespace)::Vectorizer::vectorizeInstructions()
@ 0x55593d78c221 (anonymous namespace)::Vectorizer::vectorizeChains()
@ 0x55593d78b948 (anonymous namespace)::Vectorizer::run()
@ 0x55593d78b725 (anonymous namespace)::LoadStoreVectorizer::runOnFunction()
@ 0x55593edf4b17 llvm::FPPassManager::runOnFunction()
@ 0x55593edf4e55 llvm::FPPassManager::runOnModule()
@ 0x55593edf563c (anonymous namespace)::MPPassManager::runOnModule()
@ 0x55593edf5137 llvm::legacy::PassManagerImpl::run()
@ 0x55593edf5b71 llvm::legacy::PassManager::run()
@ 0x55593ced250d xla::gpu::IrDumpingPassManager::run()
@ 0x55593ced5033 xla::gpu::(anonymous namespace)::EmitModuleToPTX()
@ 0x55593ced40ba xla::gpu::(anonymous namespace)::CompileModuleToPtx()
@ 0x55593ced33d0 xla::gpu::CompileToPtx()
@ 0x55593b26b2a2 xla::gpu::NVPTXCompiler::RunBackend()
@ 0x55593b21f973 xla::Service::BuildExecutable()
@ 0x555938f44e64 xla::LocalService::CompileExecutable()
@ 0x555938f30a85 xla::LocalClient::Compile()
@ 0x555938de3c29 tensorflow::XlaCompilationCache::BuildExecutable()
@ 0x555938de4e9e tensorflow::XlaCompilationCache::CompileImpl()
@ 0x555938de3da5 tensorflow::XlaCompilationCache::Compile()
@ 0x555938c5d962 tensorflow::XlaLocalLaunchBase::Compute()
@ 0x555938c68151 tensorflow::XlaDevice::Compute()
@ 0x55593f389e1f tensorflow::(anonymous namespace)::ExecutorState::Process()
@ 0x55593f38a625 tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()()
*** SIGABRT received by PID 7798 (TID 7837) from PID 7798; ***
llvm-svn: 337541
|
| |
|
|
|
|
|
|
|
|
|
|
| |
It's more aggressive than we need to be, and leads to strange
workarounds in other places like call return value inference. Instead,
just directly mark an edge viable.
Tests by Florian Hahn.
Differential Revision: https://reviews.llvm.org/D49408
llvm-svn: 337507
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is mostly a preparation work for adding a limited support for
select instructions. It proved to be difficult to do due to size and
irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I
believe.
It also turned out that these changes make it simpler to finish one of
the TODOs and fix a number of other small issues, namely:
1. Looking through bitcasts to a type of a different size (requires
careful tracking of the original load/store size and some math
converting sizes in bytes to expected differences in indices of GEPs).
2. Reusing partial analysis of pointers done by first attempt in proving
them consecutive instead of starting from scratch. This added limited
support for nested GEPs co-existing with difficult sext/zext
instructions. This also required a careful handling of negative
differences between constant parts of offsets.
3. Handing a case where the first pointer index is not an add, but
something else (a function parameter for instance).
I observe an increased number of successful vectorizations on a large
set of shader programs. Only few shaders are affected, but those that
are affected sport >5% less loads and stores than before the patch.
Reviewed By: rampitec
Differential-Revision: https://reviews.llvm.org/D49342
llvm-svn: 337489
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pointers.
Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by
comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or
both of them are not factorized. But isConsecutiveAccess() fails if one of the
pointers is factorized but the other one is not.
Here is an example:
PtrA = 4 * (A + B)
PtrB = 4 + 4A + 4B
This patch uses getMinusSCEV() to compute the distance between two pointers.
getMinusSCEV() allows combining the expressions and computing the simplified distance.
Author: FarhanaAleen
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D49516
llvm-svn: 337471
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]
As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.
The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266
This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
// Potential handling of non-splats: for each element:
// * if both are undef, replace with constant 0.
// Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
// * if both are not undef, and are different, bailout.
// * else, only one is undef, then pick the non-undef one.
```
This is a re-commit, as the original patch, committed in rL337190
was reverted in rL337344 as it broke chromium build:
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832
Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM
Differential Revision: https://reviews.llvm.org/D49320
llvm-svn: 337376
|
| |
|
|
|
|
|
|
| |
Those initially broke chromium build:
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832
llvm-svn: 337364
|
| |
|
|
|
|
|
|
|
|
|
| |
signed truncation' pattern""
We want the test to remain good anyway.
I think the fix is incoming.
This reverts part of commit rL337344.
llvm-svn: 337359
|
| |
|
|
|
|
|
|
|
| |
This reverts r337190 (and a few follow-up commits), which caused the
Chromium build to fail. See
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832
llvm-svn: 337344
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
InstCombine has a cast transform that matches a cast-of-select:
Orig = cast (Src = select Cond TV FV)
And tries to replace it with a select which has the cast folded in:
NewSel = select Cond (cast TV) (cast FV)
The combiner does RAUW(Orig, NewSel), so any debug values for Orig would
survive the transform. But debug values for Src would be lost.
This patch teaches InstCombine to replace all debug uses of Src with
NewSel (taking care of doing any necessary DIExpression rewriting).
Differential Revision: https://reviews.llvm.org/D49270
llvm-svn: 337310
|
| |
|
|
| |
llvm-svn: 337309
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once we resolved an undef in a function we can run Solve, which could
lead to finding a constant return value for the function, which in turn
could turn undefs into constants in other functions that call it, before
resolving undefs there.
Computationally the amount of work we are doing stays the same, just the
order we process things is slightly different and potentially there are
a few less undefs to resolve.
We are still relying on the order of functions in the IR, which means
depending on the order, we are able to resolve the optimal undef first
or not. For example, if @test1 comes before @testf, we find the constant
return value of @testf too late and we cannot use it while solving
@test1.
This on its own does not lead to more constants removed in the
test-suite, probably because currently we have to be very lucky to visit
applicable functions in the right order.
Maybe we manage to come up with a better way of resolving undefs in more
'profitable' functions first.
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma, davide
Differential Revision: https://reviews.llvm.org/D49385
llvm-svn: 337283
|
| |
|
|
|
|
| |
TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only.
llvm-svn: 337280
|
| |
|
|
|
|
|
| |
Finish same optimization for add instruction in D49216 and sdiv instruction in
D49382. This patch is for srem instruction.
llvm-svn: 337270
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49409
llvm-svn: 337230
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We are using i8 for these tests, and shifting by 4,
which is exactly the half of i8.
But as it is seen from the proofs https://rise4fun.com/Alive/mgu
KeptBits = bitwidth(%x) - MaskedBits,
so with using shifts by 4, we are not really testing that
we actually properly handle the other cases with shifts not by half...
llvm-svn: 337208
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]
As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.
Proofs for this transform: https://rise4fun.com/Alive/mgu
This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
// Potential handling of non-splats: for each element:
// * if both are undef, replace with constant 0.
// Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
// * if both are not undef, and are different, bailout.
// * else, only one is undef, then pick the non-undef one.
```
The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: JDevlieghere, rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D49320
llvm-svn: 337190
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r337081, therefore restoring r337050 (and fix in
r337059), with test fix for bot failure described after the original
description below.
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.
Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).
There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.
I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.
(I tried a few other variations of the change but they also didn't
improve time or peak memory).
The new commit removes a test that no longer makes sense
(Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the
reverse-iteration bot. The test depends on the order of processing the
summary call edges, and actually depended on the old problematic
behavior of selecting more than one summary for a given GUID when
encountered with different thresholds. There was no guarantee even
before that we would eventually pick the linkonce copy with the hottest
call edges, it just happened to work with the test and the old code, and
there was no guarantee that we would end up importing the selected
version of the copy that had the hottest call edges (since the backend
would effectively import only one of the selected copies).
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D48670
llvm-svn: 337184
|
| |
|
|
|
|
|
|
| |
and non-overflow
Differential Revision: https://reviews.llvm.org/D49365
llvm-svn: 337179
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Bug fix for PR37808. The regression test is a reduced version of the
original reproducer attached to the bug report. As stated in the report,
the problem was that InsertedPHIs was keeping dangling pointers to
deleted Memory-Phis. MemoryPhis are created eagerly and sometimes get
zapped shortly afterwards. I've used WeakVH instead of an expensive
removal operation from the active workset.
Differential Revision: https://reviews.llvm.org/D48372
llvm-svn: 337149
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49238
llvm-svn: 337143
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49283
llvm-svn: 337141
|
| |
|
|
| |
llvm-svn: 337129
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fold is repeated/misplaced in instcombine, but I'm
not sure if it's safe to remove that yet because some
other folds appear to be asserting that the transform
has occurred within instcombine itself.
This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776
We have another test to demonstrate the more general bug.
llvm-svn: 337127
|
| |
|
|
|
|
|
|
|
|
|
| |
This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776
We have another test to demonstrate the more general
bug.
llvm-svn: 337126
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O
This pattern is not commutative!
We must make sure not to fold the commuted version!
llvm-svn: 337111
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O
This pattern is not commutative!
We must make sure not to fold the commuted version!
llvm-svn: 337110
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O
This pattern is not commutative!
We must make sure not to fold the commuted version!
llvm-svn: 337109
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O
This pattern is not commutative!
We must make sure not to fold the commuted version!
llvm-svn: 337108
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O
This pattern is not commutative!
We must make sure not to fold the commuted version!
llvm-svn: 337107
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O
This pattern is not commutative!
We must make sure not to fold the commuted version!
llvm-svn: 337106
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O
This pattern is not commutative!
We must make sure not to fold the commuted version!
llvm-svn: 337105
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O
This pattern is not commutative!
We must make sure not to fold the commuted version!
llvm-svn: 337104
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Fqp
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337102
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Fqp
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337101
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337100
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337099
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337098
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337097
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337096
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337095
|
| |
|
|
|
|
| |
foldICmpWithLowBitMaskedVal()
llvm-svn: 337094
|