| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Arm specific codegen prepare is implemented to perform type promotion
on icmp operands, which can enable the removal of uxtb and uxth
(unsigned extend) instructions. This is possible because performing
type promotion before ISel alleviates this duty from the DAG builder
which has to perform legalisation, but has a limited view on data
ranges.
The pass visits any instruction operand of an icmp and creates a
worklist to traverse the use-def tree to determine whether the values
can simply be promoted. Our concern is values in the registers
overflowing the narrow (i8, i16) data range, so instructions marked
with nuw can be promoted easily. For add and sub instructions, we are
able to use the parallel dsp instructions to operate on scalar data
types and avoid overflowing bits. Underflowing adds and subs are also
permitted when the result is only used by an unsigned icmp.
Differential Revision: https://reviews.llvm.org/D48832
llvm-svn: 337687
|
| |
|
|
|
|
|
|
|
|
|
| |
In ConstructSSAForLoadSet if an available value is actually the load that we're
doing SSA construction to eliminate, then we can omit it as SSAUpdate will add
in the value for the phi that will be replacing it anyway. This can result in
simpler IR which can allow further optimisation.
Differential Revision: https://reviews.llvm.org/D44160
llvm-svn: 337686
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Bug fix for PR37445. The underlying problem and its fix are similar to PR37808.
The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is
computed before the call to tryRemoveTrivialPhi() and it ends up being out of
date, pointing to stale data. We have now turned each of the PhiOps into a
TrackingVH<MemoryAccess>.
Differential Revision: https://reviews.llvm.org/D49425
llvm-svn: 337680
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Clarify contract of StringSaver (it null-terminates, callers rely on it).
Reviewers: hokein
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49596
llvm-svn: 337677
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
partially written registers.
Summary:
Pretty mechanical follow-up for D49196.
As microarchitecture.pdf notes, "20 AMD Ryzen pipeline",
"20.8 Register renaming and out-of-order schedulers":
The integer register file has 168 physical registers of 64 bits each.
The floating point register file has 160 registers of 128 bits each.
"20.14 Partial register access":
The processor always keeps the different parts of an integer register together.
...
An instruction that writes to part of a register will therefore have a false dependence
on any previous write to the same register or any part of it.
Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh
Reviewed By: GGanesh
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D49393
llvm-svn: 337676
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Bug fix for PR36787. When reasoning if it's safe to hoist a load we
want to make sure that the defining memory access dominates the new
insertion point of the hoisted instruction. safeToHoistLdSt calls
firstInBB(InsertionPoint,DefiningAccess) which returns false if
InsertionPoint == DefiningAccess, and therefore it falsely thinks
it's safe to hoist.
Differential Revision: https://reviews.llvm.org/D49555
llvm-svn: 337674
|
| |
|
|
|
|
|
|
|
| |
a call, and then again as a return.
Also added a comment to try and explain better why we would be doing
what we're doing when hardening the (non-call) returns.
llvm-svn: 337673
|
| |
|
|
|
|
|
|
|
|
| |
This provides an overview of the algorithm used to harden specific
loads. It also brings this our terminology further in line with
hardening rather than checking.
Differential Revision: https://reviews.llvm.org/D49583
llvm-svn: 337667
|
| |
|
|
| |
llvm-svn: 337657
|
| |
|
|
|
|
|
|
| |
and rely splitOpsAndApply to handle splitting.
This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency.
llvm-svn: 337656
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This takes 22ms out of ~20s compiling sqlite3.c because we call it
for every unit of compilation and every pass.
Reviewers: paquette, anemet
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D49586
llvm-svn: 337654
|
| |
|
|
|
|
|
|
| |
APInt::getZExtValue to 0 in a place where we can't be sure contents of the APInt fit in a uint64_t.
This is used on an extract vector element index which is most cases is going to be an i32 or i64 and the element will be a valid element number. But it is possible to construct IR with a larger type and large out of range value.
llvm-svn: 337652
|
| |
|
|
|
|
|
|
| |
of 2 number of elements.
The check for the shuffles usages probably isn't correct for non power of 2 vectors.
llvm-svn: 337651
|
| |
|
|
|
|
|
| |
Factor out register class selection for global base register into a
separate function to escape long chain of ternary operators.
llvm-svn: 337647
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to the rL335185. Those commit adds some WrapperPat
patterns for microMIPS target. But declaration of the WrapperPat class
is under the NotInMicroMips predicate and microMIPS patterns cannot be
selected because predicate (Subtarget->inMicroMipsMode()) &&
(!Subtarget->inMicroMipsMode()) is always false.
This change move out the WrapperPat class declaration from the
NotInMicroMips predicate and enables microMIPS WrapperPat patterns.
Differential revision: https://reviews.llvm.org/D49533
llvm-svn: 337646
|
| |
|
|
|
|
|
| |
Reviewers: sebpop,davide,fhahn,trentxintong
Differential Revision: https://reviews.llvm.org/D49617
llvm-svn: 337643
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49382
llvm-svn: 337642
|
| |
|
|
| |
llvm-svn: 337637
|
| |
|
|
| |
llvm-svn: 337626
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
(previously it was 128-byte)
We tested different cap values with a recent commit of Chromium. Our results show that the 32-byte cap yields the smallest binary and all the caps yield similar performance.
Based on the results, we propose to change the cap value to 32-byte.
Patch by Zhaomo Yang!
Differential Revision: https://reviews.llvm.org/D49405
llvm-svn: 337622
|
| |
|
|
| |
llvm-svn: 337621
|
| |
|
|
|
|
|
|
| |
SimplifyDemandedVectorElts"
This reverts commit r337547. It triggers an infinite loop.
llvm-svn: 337617
|
| |
|
|
|
|
| |
Patch by Martell Malone.
llvm-svn: 337614
|
| |
|
|
|
|
|
|
|
|
| |
This fixes PR36096.
Originally based on a patch by Martell Malone.
Differential Revision: https://reviews.llvm.org/D44357
llvm-svn: 337613
|
| |
|
|
|
|
|
|
| |
in preparation for"
Breaks the build with LLVM_ENABLE_THREADS=OFF.
llvm-svn: 337608
|
| |
|
|
|
|
|
|
| |
This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)
llvm-svn: 337606
|
| |
|
|
| |
llvm-svn: 337599
|
| |
|
|
|
|
|
|
|
| |
Incidentally all allocations that we currently perform were
properly aligned, but this was only an accident.
Thanks to Erik Pilkington for catching this.
llvm-svn: 337596
|
| |
|
|
|
|
|
|
|
|
|
|
| |
deprecating SymbolResolver and AsynchronousSymbolQuery.
Both lookup overloads take a VSO search order to perform the lookup. The first
overload is non-blocking and takes OnResolved and OnReady callbacks. The second
is blocking, takes a boolean flag to indicate whether to wait until all symbols
are ready, and returns a SymbolMap. Both overloads take a RegisterDependencies
function to register symbol dependencies (if any) on the query.
llvm-svn: 337595
|
| |
|
|
|
|
|
|
|
|
|
| |
This discards the unresolved symbols set and returns the flags map directly
(rather than mutating it via the first argument).
The unresolved symbols result made it easy to chain lookupFlags calls, but such
chaining should be rare to non-existant (especially now that symbol resolvers
are being deprecated) so the simpler method signature is preferable.
llvm-svn: 337594
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A search order is a list of VSOs to be searched linearly to find symbols. Each
VSO now has a search order that will be used when fixing up definitions in that
VSO. Each VSO's search order defaults to just that VSO itself.
This is a first step towards removing symbol resolvers from ORC altogether. In
practice symbol resolvers tended to be used to implement a search order anyway,
sometimes with additional programatic generation of symbols. Now that VSOs
support programmatic generation of definitions via fallback generators, search
orders provide a cleaner way to achieve the desired effect (while removing a lot
of boilerplate).
llvm-svn: 337593
|
| |
|
|
|
|
| |
-Winconsistent-missing-override complains about this.
llvm-svn: 337592
|
| |
|
|
|
|
| |
Also remove a broken test case.
llvm-svn: 337591
|
| |
|
|
|
|
|
|
|
|
| |
Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction.
Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining.
Differential Revision: https://reviews.llvm.org/D49280
llvm-svn: 337590
|
| |
|
|
|
|
|
|
|
|
| |
CombineTo is most useful when you need to replace multiple results, avoid the worklist management, or you need to something else after the combine, etc. Otherwise you should be able to just return the new node and let DAGCombiner go through its usual worklist code.
All of the places changed in this patch look to be standard cases where we should be able to use the more stand behavior of just returning the new node.
Differential Revision: https://reviews.llvm.org/D49569
llvm-svn: 337589
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This adds initial support for a demangling library (LLVMDemangle)
and tool (llvm-undname) for demangling Microsoft names. This
doesn't cover 100% of cases and there are some known limitations
which I intend to address in followup patches, at least until such
time that we have (near) 100% test coverage matching up with all
of the test cases in clang/test/CodeGenCXX/mangle-ms-*.
Differential Revision: https://reviews.llvm.org/D49552
llvm-svn: 337584
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When splitting predecessors in BasicBlockUtils, we create a new block as an immediate predecessor of the original BB, then we connect a given set of predecessors to the new block.
The API in this patch will be used to update MemoryPhis for this CFG change.
If all predecessors are being moved, we move the MemoryPhi directly. Otherwise we create a new MemoryPhi in the NewBB and populate its incoming values, while deleting them from BB's Phi.
[Split from D45299 for easier review]
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D49156
llvm-svn: 337581
|
| |
|
|
|
|
|
|
| |
We can safely use getConstant here as we're still lowering, which allows constant folding to kick in and simplify the vector shift codegen.
Noticed while working on D49562.
llvm-svn: 337578
|
| |
|
|
|
|
| |
Make sure NewSI is used in materializeStores()
llvm-svn: 337577
|
| |
|
|
|
|
|
|
|
| |
Enable the optimization of operations on DPR and SPR via a feature instead
of checking the target.
Differential revision: https://reviews.llvm.org/D49463
llvm-svn: 337575
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When pointer checking is enabled, it's important that every pointer is
checked before its value is used.
For stores MSan used to generate code that calculates shadow/origin
addresses from a pointer before checking it.
For userspace this isn't a problem, because the shadow calculation code
is quite simple and compiler is able to move it after the check on -O2.
But for KMSAN getShadowOriginPtr() creates a runtime call, so we want the
check to be performed strictly before that call.
Swapping materializeChecks() and materializeStores() resolves the issue:
both functions insert code before the given IR location, so the new
insertion order guarantees that the code calculating shadow address is
between the address check and the memory access.
llvm-svn: 337571
|
| |
|
|
|
|
| |
Improve AVX1 256-bit vector HADD/HSUB matching by using SplitOpsAndApply to split into 128-bit instructions.
llvm-svn: 337568
|
| |
|
|
|
|
| |
shuffle removal
llvm-svn: 337566
|
| |
|
|
|
|
|
| |
I changed a variable's type from pointer to reference, but forgot to
update the assert-only code.
llvm-svn: 337564
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check for construction-time folding for incomplete AND nodes in
BackwardsPropagateMask.
Fixes PR38185.
Reviewers: RKSimon, samparker
Reviewed By: samparker
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D49444
llvm-svn: 337563
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Each of the four methods had a dozen lines and was doing almost exactly
the same thing: get the appropriate accelerator table kind and insert an
entry into it. I move this common logic to a helper function and make
these methods delegate to it.
This came up in the context of D49493, where I've needed to make adding
a string to a string pool slightly more complicated, and it seemed to
make sense to do it in one place instead of five.
To make this work I've needed to unify the interface of the AccelTable
data types, as some used to store DIE& and others DIE*. I chose to unify
to a reference as that's what the caller uses.
This technically isn't NFC, because it changes the StringPool used for
apple tables in the DWO case (now it uses the main file like DWARF v5
instead of the DWO file). However, that shouldn't matter, as DWO is not
a thing on apple targets (clang frontend simply ignores -gsplit-dwarf).
Reviewers: JDevlieghere, aprantl, probinson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49542
llvm-svn: 337562
|
| |
|
|
|
|
| |
shuffle removal
llvm-svn: 337561
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When merging through a TokenFactor we need to check that the
load may be ordered such that no other aliasing memory operations may
happen. It is not sufficient to just check that the load is a member
of the chain token factor as it there may be a indirect chain. Require
the load's chain has only one use.
This fixes PR37826.
Reviewers: spatel, davide, efriedma, craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D49388
llvm-svn: 337560
|
| |
|
|
| |
llvm-svn: 337554
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.
Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.
Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.
Reviewers: dberlin, mssimpso, davide, efriedma
Reviewed By: mssimpso
Differential Revision: https://reviews.llvm.org/D43762
llvm-svn: 337548
|