| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 356069
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
user later.
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.
Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.
CodeGen tests with non-reordering changes:
X86/aligned-variadic.ll -- memory-based add folded into stored leaq
value.
X86/constant-combiners.ll -- Optimizes out overlap between stores.
X86/pr40631_deadstore_elision -- folds constant byte store into
preceding quad word constant store.
Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet
Reviewed By: courbet
Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59260
llvm-svn: 356068
|
| |
|
|
|
|
| |
Always check candidates for hasOtherUses(), not only stores.
llvm-svn: 356050
|
| |
|
|
|
|
|
|
|
|
| |
First step towards PR40800 - I intend to move the float case in a separate future patch.
I had to tweak the (overly reduced) thumb2 test and the x86 widening test change is annoying (no longer rematerializable) but we should address this separately.
Differential Revision: https://reviews.llvm.org/D59244
llvm-svn: 356040
|
| |
|
|
|
|
| |
Update the INC pass to allow folding unordered atomics. This is the first optimization unblocked by the changes landed from D57601.
llvm-svn: 356006
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every time a physical register reference was parsed, this would
initialize a string map for every register in in target, and discard
it for the next. The same applies for the other fields initialized
from target information.
Follow along with how the function state is tracked, and add a new
tracking class for target information.
The string->register class/register bank for some reason were kept
separately, so track them in the same place.
llvm-svn: 355970
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The existing statepoint lowering code does something odd; it adds machine memory operands post instruction selection. This was copied from the stackmap/patchpoint implementation, but appears to be non-idiomatic.
This change is largely NFC. It moves the MMO creation logic into SelectionDAG building. It ends up not quite being NFC because the size of the stack slot is reflected in the MMO. The old code blindly used pointer size for the MMO size, which appears to have always been incorrect for larger values. It just happened nothing actually relied on the MMOs, so it worked out okay.
For context, I'm planning on removing the MOVolatile flag from these in a future commit, and then removing the MOStore flag from deopt spill slots in a separate one. Doing so is motivated by a small test case where we should be able to better schedule spill slots, but don't do so due to a memory use/def implied by the statepoint.
Differential Revision: https://reviews.llvm.org/D59106
llvm-svn: 355953
|
| |
|
|
|
|
|
|
|
|
|
| |
Expand MULO with constant power of two operand into a shift. The
overflow is checked with (x << shift) >> shift == x, where the right
shift will be logical for umulo and arithmetic for smulo (with
exception for multiplications by signed_min).
Differential Revision: https://reviews.llvm.org/D59041
llvm-svn: 355937
|
| |
|
|
| |
llvm-svn: 355932
|
| |
|
|
|
|
|
|
|
| |
Targets can potentially emit more efficient code if they know address
computations never overflow. For example ILP32 code on AArch64 (which only has
64-bit address computation) can ignore the possibility of overflow with this
extra information.
llvm-svn: 355926
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D59140
llvm-svn: 355904
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.
Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.
Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal
Reviewed By: sdesmalen
Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57728
llvm-svn: 355889
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Swift now generates PDBs for debugging on Windows. llvm and lldb
need a language enumerator value too properly handle the output
emitted by swiftc.
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59231
llvm-svn: 355882
|
| |
|
|
|
|
| |
This reverts commit r355868. Breaks hexagon.
llvm-svn: 355873
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: It is incomplete and has no users AFAIK.
Reviewers: pcc, vitalybuka
Subscribers: srhines, kubamracek, mgorny, krytarowski, eraman, hiraditya, jdoerfert, #sanitizers, llvm-commits, thakis
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D59154
llvm-svn: 355870
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.
Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal
Reviewed By: sdesmalen
Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57728
llvm-svn: 355868
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Overloaded intrinsics aren't necessarily safe for instruction selection. One
such intrinsic is aarch64.neon.addp.*.
This is a temporary workaround to ensure that we always fall back on that
intrinsic. Eventually this will be replaced with a proper solution.
https://bugs.llvm.org/show_bug.cgi?id=40968
Differential Revision: https://reviews.llvm.org/D59062
llvm-svn: 355865
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.
Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.
This also includes a few more changes to make this work somewhat
reasonably:
* Add support for expanding VECREDUCE in SDAG. Usually
experimental.vector.reduce is expanded prior to codegen, but if the
target does have native vector reduce, it may of course still be
necessary to expand due to legalization issues. This uses a shuffle
reduction if possible, followed by a naive scalar reduction.
* Allow the result type of integer VECREDUCE to be larger than the
vector element type. For example we need to be able to reduce a v8i8
into an (nominally) i32 result type on AArch64.
* Use the vector operand type rather than the scalar result type to
determine the action, so we can control exactly which vector types are
supported. Also change the legalize vector op code to handle
operations that only have vector operands, but no vector results, as
is the case for VECREDUCE.
* Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
explicitly specify for which vector types the reductions are supported.
This does not handle anything related to VECREDUCE_STRICT_*.
Differential Revision: https://reviews.llvm.org/D58015
llvm-svn: 355860
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a fix for https://bugs.llvm.org/show_bug.cgi?id=40986 ("excessive compile
time building opencollada"), this patch makes sure that no phys reg is hinted
more than once from getRegAllocationHints().
This handles the case were many virtual registers are assigned to the same
physreg. The previous compile time fix (r343686) in weightCalcHelper() only
made sure that physical/virtual registers are passed no more than once to
addRegAllocationHint().
Review: Dimitry Andric, Quentin Colombet
https://reviews.llvm.org/D59201
llvm-svn: 355854
|
| |
|
|
| |
llvm-svn: 355847
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extract the functionality of eliminating unreachable basic blocks
within a function, previously encapsulated within the
-unreachableblockelim pass, and make it available as a function within
BlockUtils.h. No functional change intended other than making the logic
reusable.
Exposing this logic makes it easier to implement
https://reviews.llvm.org/D59068, which fixes coroutines bug
https://bugs.llvm.org/show_bug.cgi?id=40979.
Reviewers: mkazantsev, wmi, davidxl, silvas, davide
Reviewed By: davide
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59069
llvm-svn: 355846
|
| |
|
|
| |
llvm-svn: 355845
|
| |
|
|
|
|
|
|
| |
constant/commute folds.
Noticed while looking at PR40800 (and also D57921)
llvm-svn: 355828
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inserting an overflowing arithmetic intrinsic can increase register
pressure by producing two values at a point where only one is needed,
while the second use maybe several blocks away. This increase in
pressure is likely to be more detrimental on performance than
rematerialising one of the original instructions.
So, check that the arithmetic and compare instructions are no further
apart than their immediate successor/predecessor.
Differential Revision: https://reviews.llvm.org/D59024
llvm-svn: 355823
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The control flow here cannot ever use the uninitialized value, but it's
too hard for the compiler to figure that out. Clang warns:
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2600:28: error: variable 'CarrySum' is used uninitialized whenever 'for' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized]
for (unsigned i = 2; i < Factors.size(); ++i)
^~~~~~~~~~~~~~~~~~
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2604:26: note: uninitialized use occurs here
CarrySumPrevDstIdx = CarrySum;
^~~~~~~~
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2600:28: note: remove the condition if it is always true
for (unsigned i = 2; i < Factors.size(); ++i)
^~~~~~~~~~~~~~~~~~
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2583:22: note: initialize the variable 'CarrySum' to silence this warning
unsigned CarrySum;
^
= 0
llvm-svn: 355818
|
| |
|
|
|
|
|
|
|
|
| |
NarrowScalar G_UMULH in LegalizerHelper
using multiplyRegisters helper function.
NarrowScalar G_UMULH for MIPS32.
Differential Revision: https://reviews.llvm.org/D58825
llvm-svn: 355815
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Narrow Scalar G_MUL for MIPS32.
Revisit NarrowScalar implementation in LegalizerHelper.
Introduce new helper function multiplyRegisters.
It performs generic multiplication of values held in multiple registers.
Generated instructions use only types NarrowTy and i1.
Destination can be same or two times size of the source.
Differential Revision: https://reviews.llvm.org/D58824
llvm-svn: 355814
|
| |
|
|
| |
llvm-svn: 355794
|
| |
|
|
| |
llvm-svn: 355791
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary."
Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher.
Original commit message:
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.
By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.
This removes something like 40,000 bytes from the X86 isel table.
Differential Revision: https://reviews.llvm.org/D58595
llvm-svn: 355784
|
| |
|
|
|
|
| |
on the patern matched. NFC
llvm-svn: 355769
|
| |
|
|
|
|
|
|
|
|
|
|
| |
uint32_t/uint64_t for getelementptr, extractelement, and insertelement.
This saves needing to call getInt32 ourselves. Making the code a little shorter.
The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue.
This cleanup because I plan to write similar code for expandload/compressstore.
llvm-svn: 355767
|
| |
|
|
| |
llvm-svn: 355759
|
| |
|
|
|
|
| |
repeated lookup operations
llvm-svn: 355757
|
| |
|
|
|
|
|
|
|
|
| |
were added.
There are special cases in the scalarization for constant masks. If we hit one of the special cases we don't need to reset the iteration.
Noticed while starting work on adding expandload/compressstore to this pass.
llvm-svn: 355754
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be
maintained correctly in optimizations in optimizeBlock() and optimizeInst().
Function optimizeSelectInst() does not update the flag.
This patch propagates the flag in optimizeSelectInst() back to
optimizeBlock().
This patch also removes ModifiedDT in CodeGenPrepare class (which is not used).
The property of ModifiedDT is now recorded in a ref parameter.
Differential Revision: https://reviews.llvm.org/D59139
llvm-svn: 355751
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids breaking possible value dependencies when sorting loads by
offset.
AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.
Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.
llvm-svn: 355728
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This pattern is sometime created after legalization.
Reviewers: efriedma, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58874
llvm-svn: 355716
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
many valnos.
Recently we found compile time out problem in several cases when
SpeculativeLoadHardening was enabled. The significant compile time was spent
in register coalescing pass, where register coalescer tried to join many other
live intervals with some very large live intervals with many valnos.
Specifically, every time JoinVals::mapValues is called, computeAssignment will
be called by getNumValNums() times of the target live interval. If the large
live interval has N valnos and has N copies associated with it, trying to
coalescing those copies will at least cost N^2 complexity.
The patch adds some limit to the effort trying to join those very large live
intervals with others. By default, for live interval with > 100 valnos, and
when it has been coalesced with other live interval by more than 100 times,
we will stop coalescing for the live interval anymore. That put a compile
time cap for the N^2 algorithm and effectively solves the compile time
problem we saw.
Differential revision: https://reviews.llvm.org/D59143
llvm-svn: 355714
|
| |
|
|
| |
llvm-svn: 355690
|
| |
|
|
| |
llvm-svn: 355689
|
| |
|
|
| |
llvm-svn: 355688
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The logic in the -unreachableblockelim pass does the following:
1. It traverses the function it's given in depth-first order and
creates a set of basic blocks that are unreachable from the
function's entry node.
2. It iterates over each of those unreachable blocks and (1) removes any
successors' references to the dead block, and (2) replaces any uses of
instructions from the dead block with null.
The logic in (2) above is identical to what the `llvm::DeleteDeadBlocks`
function from `BasicBlockUtils.h` does. The only difference is that
`llvm::DeleteDeadBlocks` replaces uses of instructions from dead blocks
not with null, but with undef.
Replace the duplicate logic in the -unreachableblockelim pass with a
call to `llvm::DeleteDeadBlocks`. This results in less code but no
functional change (NFC).
Reviewers: mkazantsev, wmi, davidxl, silvas, davide
Reviewed By: davide
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59064
llvm-svn: 355634
|
| |
|
|
| |
llvm-svn: 355542
|
| |
|
|
|
|
|
|
|
|
| |
Restore a reverted commit, with the silly mistake fixed. Sorry for the previous breakage.
Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead.
Differential Revision: https://reviews.llvm.org/D58760
llvm-svn: 355540
|
| |
|
|
|
|
|
|
| |
Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well.
Requested by @spatel on D59006
llvm-svn: 355533
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D58965
llvm-svn: 355517
|
| |
|
|
|
|
| |
We had 2 local variable names for the same type.
llvm-svn: 355516
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with unreachable default"
This reverts commit 2a0f2c5ef3330846149598220467d9f3c6e8b99c (r355490).
The commit causes an assertion failure when compiling LLVM code:
$ cat repro.cpp
class QQQ {
public:
bool x() const;
bool y() const;
unsigned getSizeInBits() const {
if (y() || x())
return getScalarSizeInBits();
return getScalarSizeInBits() * 2;
}
unsigned getScalarSizeInBits() const;
};
int f(const QQQ &Ty) {
switch (Ty.getSizeInBits()) {
case 1:
case 8:
return 0;
case 16:
return 1;
case 32:
return 2;
case 64:
return 3;
default:
__builtin_unreachable();
}
}
$ clang -O2 -o repro.o repro.cpp
assert.h assertion failed at llvm/include/llvm/ADT/ilist_iterator.h:139 in llvm::ilist_iterator::reference llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, true, false>::operator*() const [OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, IsReverse = true, IsConst = false]: !NodePtr->isKnownSentinel()
*** Check failure stack trace: ***
@ 0x558aab4afc10 __assert_fail
@ 0x558aa885479b llvm::ilist_iterator<>::operator*()
@ 0x558aa8854715 llvm::MachineInstrBundleIterator<>::operator*()
@ 0x558aa92c33c3 llvm::X86InstrInfo::optimizeCompareInstr()
@ 0x558aa9a9c251 (anonymous namespace)::PeepholeOptimizer::optimizeCmpInstr()
@ 0x558aa9a9b371 (anonymous namespace)::PeepholeOptimizer::runOnMachineFunction()
@ 0x558aa99a4fc8 llvm::MachineFunctionPass::runOnFunction()
@ 0x558aab019fc4 llvm::FPPassManager::runOnFunction()
@ 0x558aab01a3a5 llvm::FPPassManager::runOnModule()
@ 0x558aab01aa9b (anonymous namespace)::MPPassManager::runOnModule()
@ 0x558aab01a635 llvm::legacy::PassManagerImpl::run()
@ 0x558aab01afe1 llvm::legacy::PassManager::run()
@ 0x558aa5914769 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly()
@ 0x558aa5910f44 clang::EmitBackendOutput()
@ 0x558aa5906135 clang::BackendConsumer::HandleTranslationUnit()
@ 0x558aa6d165ad clang::ParseAST()
@ 0x558aa6a94e22 clang::ASTFrontendAction::ExecuteAction()
@ 0x558aa590255d clang::CodeGenAction::ExecuteAction()
@ 0x558aa6a94840 clang::FrontendAction::Execute()
@ 0x558aa6a38cca clang::CompilerInstance::ExecuteAction()
@ 0x558aa4e2294b clang::ExecuteCompilerInvocation()
@ 0x558aa4df6200 cc1_main()
@ 0x558aa4e1b37f ExecuteCC1Tool()
@ 0x558aa4e1a725 main
@ 0x7ff20d56abbd __libc_start_main
@ 0x558aa4df51c9 _start
llvm-svn: 355515
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In r354298 a DominatorTree construction was added via new function
combineToUSubWithOverflow, which was subsequently restructured into
replaceMathCmpWithIntrinsic in r354689. We are hitting a very long
compile time due to this repeated construction, once per math cmp in
the function.
We shouldn't need to build the DominatorTree more than once per
function, except when a transformation invalidates it. There is already
a boolean flag that is returned from these methods indicating whether
the DT has been modified. We can simply build the DT once per
Function walk in CodeGenPrepare::runOnFunction, since any time a change
is made we break out of the Function walk and restart it.
I modified the code so that both replaceMathCmpWithIntrinsic as well as
mergeSExts (which was also building a DT) use the DT constructed by the
run method.
From -mllvm -time-passes:
Before this patch: CodeGen Prepare user time is 328s
With this patch: CodeGen Prepare user time is 21s
Reviewers: spatel
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58995
llvm-svn: 355512
|