| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Handling relocations was not needed when the loclists section was a
DWO-only thing. But since DWARF5, it is possible to use it in regular
objects too, and the standard permits embedding addresses into the
section directly. These addresses need to be relocated in unlinked
files.
Reviewers: JDevlieghere, dblaikie, probinson
Subscribers: aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68271
|
| |
|
|
|
|
|
|
|
| |
With a few things fixed:
- initialisaiton of the optimisation remark pass (this was causing the buildbot
failures on PPC),
- a test case.
Differential Revision: https://reviews.llvm.org/D69660
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
2*log2(bitwidth)+1 for legal types.
This better represents the kshift+binop we'd get for each stage
before the final extract. Its likely we'll do even better by
doing a kmov and a cmp with a GPR, but this is a good start.
The default handling was costing a worst case single source
permute shuffle of the vector before the binop. This worst
case assumes the shuffle might have to be emulated with
extracts and inserts. But since we know we're doing a reduction
we can assume we'll get kshift lowering.
There's still some room for improvement here, but this is
much better than it was.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Define Instruction::Freeze, let it be UnaryOperator
- Add support for freeze to LLLexer/LLParser/BitcodeReader/BitcodeWriter
The format is `%x = freeze <ty> %v`
- Add support for freeze instruction to llvm-c interface.
- Add m_Freeze in PatternMatch.
- Erase freeze when lowering IR to SelDag.
Reviewers: deadalnix, hfinkel, efriedma, lebedev.ri, nlopes, jdoerfert, regehr, filcab, delcypher, whitequark
Reviewed By: lebedev.ri, jdoerfert
Subscribers: jfb, kristof.beyls, hiraditya, lebedev.ri, steven_wu, dexonsmith, xbolva00, delcypher, spatel, regehr, trentxintong, vsk, filcab, nlopes, mehdi_amini, deadalnix, llvm-commits
Differential Revision: https://reviews.llvm.org/D29011
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit fff2721286e1 ("[BPF] Fix CO-RE bugs with bitfields")
fixed CO-RE handling bitfield issues. But the implementation
introduced a use after free bug. The "Base" of the intrinsic
might be freed so later on accessing the Type of "Base"
might access the freed memory. The failed test case,
CodeGen/BPF/CORE/offset-reloc-middle-chain.ll
is exactly used to test such a case.
Similarly to previous attempt to remember Metadata etc,
remember "Base" pointee Alignment in advance to avoid
such use after free bug.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enable 2-byte VEX encoding.
Summary:
The 2 source operands commutable instructions are encoded in the
VEX.VVVV field and the r/m field of the MODRM byte plus the VEX.B
field.
The VEX.B field is missing from the 2-byte VEX encoding. If the
VEX.VVVV source is 0-7 and the other register is 8-15 we can
swap them to avoid needing the VEX.B field. This works as long as
the VEX.W, VEX.mmmmm, and VEX.X fields are also not needed.
Fixes PR36706.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68550
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bitfield handling is not robust with current implementation.
I have seen two issues as described below.
Issue 1:
struct s {
long long f1;
char f2;
char b1:1;
} *p;
The current approach will generate an access bit size
56 (from b1 to the end of structure) which will be
rejected as it is not power of 2.
Issue 2:
struct s {
char f1;
char b1:3;
char b2:5;
char b3:6:
char b4:2;
char f2;
};
The LLVM will group 4 bitfields together with 2 bytes. But
loading 2 bytes is not correct as it violates alignment
requirement. Note that sometimes, LLVM breaks a large
bitfield groups into multiple groups, but not in this case.
To resolve the above two issues, this patch takes a
different approach. The alignment for the structure is used
to construct the offset of the bitfield access. The bitfield
incurred memory access is an aligned memory access with alignment/size
equal to the alignment of the structure.
This also simplified the code.
This may not be the optimal memory access in terms of memory access
width. But this should be okay since extracting the bitfield value
will have the same amount of work regardless of what kind of
memory access width.
Differential Revision: https://reviews.llvm.org/D69837
|
| |
|
|
| |
Fix the costs of integer division.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.
We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.
This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886
Differential Revision: https://reviews.llvm.org/D69815
|
| |
|
|
|
|
| |
This class was a bit overengineered, and was triggering some PVS warnings.
Instead, put strings into a NameType and let clients unconditionally treat it
as a Node.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-mvzeroupper will force the vzeroupper insertion pass to run on
CPUs that normally wouldn't. -mno-vzeroupper disables it on CPUs
where it normally runs.
To support this with the default feature handling in clang, we
need a vzeroupper feature flag in X86.td. Since this flag has
the opposite polarity of the fast-partial-ymm-or-zmm-write we
used to use to disable the pass, we now need to add this new
flag to every CPU except KNL/KNM and BTVER2 to keep identical
behavior.
Remove -fast-partial-ymm-or-zmm-write which is no longer used.
Differential Revision: https://reviews.llvm.org/D69786
|
| |
|
|
|
|
|
|
|
|
| |
slow path blocks
This transformation is a variation on the GuardWidening transformation we have checked in as it's own pass. Instead of focusing on merge (i.e. hoisting and simplifying) two widenable branches, this transform makes the observation that simply removing a second slowpath block (by reusing an existing one) is often a very useful canonicalization. This may lead to later merging, or may not. This is a useful generalization when the intermediate block has loads whose dereferenceability is hard to establish.
As noted in the patch, this can be generalized further, and will be.
Differential Revision: https://reviews.llvm.org/D69689
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuation of:
D69116
Contributes to a fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559
See also D69099 and D69116
Use the TLI hook in DAGCombine.cpp to guard against creating
shift nodes that are not optimal for a target.
Patch by: @joanlluch (Joan LLuch)
Differential Revision: https://reviews.llvm.org/D69120
|
| | |
|
| | |
|
| |
|
|
| |
Fixes https://bugs.llvm.org/show_bug.cgi?id=43891
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch (second of two patches) lowers the generic PowerPC vector
entries to PowerPC subtarget-specific entries.
For instance, the PowerPC generic entry 'cbrtd2_massv' is lowered to
'cbrtd2_P9' or Power9 subtarget.
The first patch enables the vectorizer to recognize the IBM MASS vector
library routines. This patch specifically adds support for recognizing
the '-vector-library=MASSV' option, and defines mappings from IEEE
standard scalar math functions to generic PowerPC MASS vector
counterparts.
For instance, the generic PowerPC MASS vector entry for double-precision
'cbrt' function is '__cbrtd2_massv'
The overall support for MASS vector library is presented as such in two
patches for ease of review.
Patch by pjeeva01 (Jeeva P.)
Differential Revision: https://reviews.llvm.org/D59883
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 004ed2b0d1b86d424643ffc88fce20ad8bab6804.
Original commit hash 6d03890384517919a3ba7fe4c35535425f278f89
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.
https://reviews.llvm.org/D67723
|
| |
|
|
|
|
|
|
|
|
|
| |
Small refactoring in visitConstrainedFPIntrinsic that should make
it easier to create DAG nodes requiring extra arguments. That is
the case currently only for STRICT_FP_ROUND, but may be the case
for additional nodes (in particular compares) in the future.
Extracted from the patch for D69281.
NFC.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the GEP instructions are going to be vectorized, the indices in those
GEP instructions must be of the same type. Otherwise, the compiler may
crash when trying to build the vector constant.
Reviewers: RKSimon, spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69627
|
| | |
|
| |
|
|
|
| |
Review: Ulrich Weigand
https://reviews.llvm.org/D68267
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MachineVerifier::visitMachineFunctionAfter() is extended to check the
live-through case for live-in lists. This is only done for registers without
aliases and that are neither allocatable or reserved, such as the SystemZ::CC
register.
The MachineVerifier earlier only catched the case of a live-in use without
an entry in the live-in list (as "using an undefined physical register").
A comment in LivePhysRegs.h has been added stating a guarantee that
addLiveOuts() can be trusted for a full register both before and after
register allocation.
Review: Quentin Colombet
https://reviews.llvm.org/D68267
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd,
where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually
available for scalar code. For MVE we don't have the non-fused version
though. It makes more sense for isFMAFasterThanFMulAndFAdd to return
true, allowing us to simplify some of the existing ISel patterns.
The tests here are that non of the existing tests failed, and so we are
still selecting VFMA and VFMS. The one test that changed shows we can
now select from fast math flags, as opposed to just relying on the
isFMADLegalForFAddFSub option.
Differential Revision: https://reviews.llvm.org/D69115
|
| |
|
|
|
|
|
|
|
|
| |
As noted in follow-up to:
rGa1e8ad4f2fa7
It's not safe to assume that an element of the constant is always
non-null. It's definitely not an expected case for the current
instcombine user, but that may not hold if this function is
eventually called from arbitrary places.
|
| |
|
|
| |
Typo in comment. NFC.
|
| |
|
|
|
|
| |
(NFC)"
This reverts commit 2be17087f8c38934b7fc9208ae6cf4e9b4d44f4b. Fails ASAN.
|
| |
|
|
|
|
|
| |
Fill in the gaps for vrev32.16 f16 patterns, extending the existing i16
patterns.
Differential Revision: https://reviews.llvm.org/D69508
|
| |
|
|
|
| |
This is part of a series of patches needed to solve PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535
|
| |
|
|
|
|
|
| |
This is a special calling convention to be used by the GHC compiler.
Author: Stefan Schulze Frielinghaus
Differential Revision: https://reviews.llvm.org/D69024
|
| |
|
|
|
|
|
|
|
|
| |
DemandedElts mask (REAPPLIED)
If we don't demand all elements, then attempt to combine to a simpler shuffle.
At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts (see D66004).
This reapplies rL368307 (reverted at rL369167) after the fix for the infinite loop reported at PR43024 was applied at rG3f087e38a2e7b87a5adaaac1c1b61e51220e7ff3
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: RKSimon, ostannard
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69795
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The hook should work for any RISC-V register. Non-allocatable registers
do not need to be reserved, for the remaining the hook will only succeed
if you pass clang the -ffixed-xX flag. This builds upon D67185, which
currently only allows reserving GPRs.
Reviewers: asb, lenary
Reviewed By: lenary
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69130
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was an experiment made possible by a non-standard feature of the
Android dynamic loader.
It required introducing a flag to tell the compiler which ABI was being
targeted.
This flag is no longer needed, since the generated code now works for
both ABI's.
We leave that flag untouched for backwards compatibility. This also
means that if we need to distinguish between targeted ABI's again
we can do that without disturbing any existing workflows.
We leave a comment in the source code and mention in the help text to
explain this for any confused person reading the code in the future.
Patch by Matthew Malcomson
Differential Revision: https://reviews.llvm.org/D69574
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Demand that an immediate offset to a PC relative address fits in 32 bits, or
else load it into a register and perform a separate add.
Verify in the assembler that such immediate offsets fit the bitwidth.
Even though the final address of a Load Address Relative Long may fit in 32
bits even with a >32 bit offset (depending on where the symbol lives relative
to PC), the GNU toolchain demands the offset by itself to be in range. This
patch adapts the same behavior for llvm.
Review: Ulrich Weigand
https://reviews.llvm.org/D69749
|
| |
|
|
|
|
|
|
|
|
| |
The sink-after and interleave-group vectorization decisions were so far applied to
VPlan during initial VPlan construction, which complicates VPlan construction – also because of
their inter-dependence. This patch refactors buildVPlanWithRecipes() to construct a simpler
initial VPlan and later apply both these vectorization decisions, in order, as VPlan-to-VPlan
transformations.
Differential Revision: https://reviews.llvm.org/D68577
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch sets the FPSW (X87 floating-point status register) as a reserved
physical register and fix the test failure caused by [[ https://reviews.llvm.org/D68854| D68854 ]].
Before this patch, some tests will fail because it implicit uses FPSW without
define it. Setting the FPSW as a reserved physical register will skip liveness
analysis because it is always live.
Reviewers: pengfei, craig.topper
Reviewed By: craig.topper
Subscribers: craig.topper, hiraditya, llvm-commits
Patch by LiuChen.
Differential Revision: https://reviews.llvm.org/D69784
|
| |
|
|
|
|
|
|
| |
KnownZero if it removes an input.
This stops infinite loops where KnownUndef elements are converted to Zeroable, resulting in KnownZero elements which are then simplified (via SimplifyDemandedElts etc.) back to KnownUndef elements........
Prep fix for PR43024 which will allow rL368307 to be re-applied.
|
| |
|
|
| |
statement.' warning. NFCI.
|
| | |
|
| |
|
|
| |
NFCI.'
|
| |
|
|
|
|
| |
NFCI."
This reverts commit 8308187fd9bfa08ffad0a636d4dd1d25e7de5a76. This exposed a bug.
|
| | |
|
| | |
|
| |
|
|
|
|
| |
NFCI."
This reverts commit b8685cf3042f6a2e129061922bd6b18e3c42258e.
|
| | |
|
| | |
|
| | |
|