| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The following intrinsics for binary narrowing shift righ operations are
added:
* @llvm.aarch64.sve.shrnb
* @llvm.aarch64.sve.uqshrnb
* @llvm.aarch64.sve.sqshrnb
* @llvm.aarch64.sve.sqshrunb
* @llvm.aarch64.sve.uqrshrnb
* @llvm.aarch64.sve.sqrshrnb
* @llvm.aarch64.sve.sqrshrunb
* @llvm.aarch64.sve.shrnt
* @llvm.aarch64.sve.uqshrnt
* @llvm.aarch64.sve.sqshrnt
* @llvm.aarch64.sve.sqshrunt
* @llvm.aarch64.sve.uqrshrnt
* @llvm.aarch64.sve.sqrshrnt
* @llvm.aarch64.sve.sqrshrunt
Reviewers: sdesmalen, rengolin, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71552
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Fix an issue with the incorrect value being used for the number of
elements being passed to [d|w]lstp. We were trying to check that
the value was available at LoopStart, but this doesn't consider
that the last instruction in the block could also define the
register. Two helpers have been added to RDA for this.
2) Insert some code to now try to move the element count def or the
insertion point so that we can perform more tail predication.
3) Related to (1), the same off-by-one could prevent us from
generating a low-overhead loop when a mov lr could have been
the last instruction in the block.
4) Fix up some instruction attributes so that not all the
low-overhead loop instructions are labelled as branches and
terminators - as this is not true for dls/dlstp.
Differential Revision: https://reviews.llvm.org/D71609
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Record the discovered VPT blocks while checking for validity and, for
now, only handle blocks that begin with VPST and not VPT. We're now
allowing more than one instruction to define vpr, but each block must
somehow be predicated using the vctp. This leaves us with several
scenarios which need fixing up:
1) A VPT block with is only predicated by the vctp and has no
internal vpr defs.
2) A VPT block which is only predicated by the vctp but has an
internal vpr def.
3) A VPT block which is predicated upon the vctp as well as another
vpr def.
4) A VPT block which is not predicated upon a vctp, but contains it
and all instructions within the block are predicated upon in.
The changes needed are, for:
1) The easy one, just remove the vpst and unpredicate the
instructions in the block.
2) Remove the vpst and unpredicate the instructions up to the
internal vpr def. Need insert a new vpst to predicate the
remaining instructions.
3) No nothing.
4) The vctp will be inside a vpt and the instruction will be removed,
so adjust the size of the mask on the vpst.
Differential Revision: https://reviews.llvm.org/D71107
|
|
|
|
|
|
| |
Add VMULL and VQDMULL variants to our tail predication white list.
Differential Revision: https://reviews.llvm.org/D71465
|
|
|
|
|
|
|
|
|
|
|
|
| |
for STRICT_FCMP. NFCI
The only thing its getting from the X86TargetLowering class is
the subtarget which we can easily pass. This function only has
one call site now since this might help the compiler inline it.
Explicitly return both the flag result and the chain result for
STRICT_FCMP nodes. This removes an assumption in the caller that
getValue(1) is the right way to get the chain.
|
|
|
|
|
|
|
|
|
| |
constant and calling EmitCmp. NFCI
EmitCmp will just immediately call EmitTest and discard the null
constant only to have EmitTest create it again if it doesn't fold.
So just skip all that and go directly to EmitTest.
|
|
|
|
| |
Patch d9220b580b3 fixed the underlying issue that casued 298e183e813 to fail.
|
|
|
|
| |
This fixes the underlying bug that was exposed by 298e183e813.
|
|
|
|
|
|
|
|
| |
This allows cl::opt<int64_t>.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D71729
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
All the use cases of CallAnalyzer use the same call site parameter to
both construct the CallAnalyzer, and then pass to the analysis member.
This change removes this duplication.
Reviewers: davidxl, eraman, Jim
Reviewed By: davidxl
Subscribers: Jim, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71645
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
consideration (PR43267)
Summary:
It is pretty common to assume that something is not zero.
Even optimizer itself sometimes emits such assumptions
(e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`).
But we currently don't deal with such assumptions :)
The only way `isKnownNonZero()` handles assumptions is
by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`.
But `x != 0` does not tell us anything about set bits,
it only says that there are *some* set bits.
So naturally, `KnownBits` does not get populated,
and we fail to make use of this assumption.
I propose to deal with this special case by special-casing it
via adding a `isKnownNonZeroFromAssume()` that returns boolean
when there is an applicable assumption.
While there, we also deal with other predicates,
mainly if the comparison is with constant.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 | PR43267 ]].
Differential Revision: https://reviews.llvm.org/D71660
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
execution to successor
This is a pretty rare case, when CxtI and assume are
in the same basic block, with assume being located later.
We were already checking that assumption was guaranteed to be
executed, but we omitted CxtI itself from consideration,
and as the test (miscompile) shows, that is incorrect.
As noted in D71660 review by @nikic.
|
|
|
|
|
|
|
| |
A function marked `noreturn` may contain unreachable terminators: these
should not be considered cold, as the function may be a trampoline.
rdar://58068594
|
|
|
|
|
|
| |
Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all.
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
|
|
|
|
|
|
|
|
|
|
|
| |
DWARF optimizing part 2."
as it causes a layering violation/dependency cycle:
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp -> llvm/DebugInfo/DWARF/DWARFExpression.h
llvm/include/llvm/DebugInfo/DWARF/DWARFOptimizer.h -> llvm/CodeGen/NonRelocatableStringpool.h
This reverts commit abc7f6800df8a1f40e1e2c9ccce826abb0208284.
|
|
|
|
|
|
|
|
|
| |
Summary:
When we use undefined symbol with its qualname, we are not able
to generate that symbol because of the logic of early "continue"
that skip the qualname symbol. This patch fixes it.
Differential revision: https://reviews.llvm.org/D71667
|
|
|
|
|
|
| |
as it broke the aarch64 build.
This reverts commit bc7595d934b958ab481288d7b8e768fe5310be8f.
|
|
|
|
| |
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
|
|
|
|
| |
This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
|
|
|
|
| |
PromoteLegalINT_TO_FP to prevent an invalid strict fp node from being created by falling into non-strict code path.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71703
|
|
|
|
|
|
|
|
|
|
| |
Summary: Instead of crashing due to the `llvm_unreachable`, provide a proper
error when invalid fixups/relocations are encountered.
Reviewers: asb, lenary
Reviewed By: asb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71536
|
|
|
|
|
|
|
| |
Emit the __mcount_loc section for all fentry calls.
Review: Ulrich Weigand
https://reviews.llvm.org/D71629
|
|
|
|
|
|
|
|
|
|
| |
This patch enables the machine outliner for RISC-V and adds the
necessary logic for checking whether sequences can be safely outlined,
and describing how they should be outlined. Outlined functions are
called using the register t0 (x5) as the return address register, which
must be available for an occurrence of a sequence to be safely outlined.
Differential Revision: https://reviews.llvm.org/D66210
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch associates ordinal numbers to the DDG Nodes allowing
the builder to order nodes within a pi-block in program order. The
algorithm works by simply assuming the order in which the BBList
is fed into the builder. The builder already relies on the blocks being
in program order so that it can compute the dependencies correctly.
Similarly the order of instructions in their parent basic blocks
determine their program order.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70986
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The default static (non-PIC, non-PIE) model for 32-bit powerpc does not
use @PLT annotations and relocations in GCC. LLVM shouldn't use @PLT
annotations either, because it breaks secure-PLT linking with (some
versions of?) GNU LD.
Update the available-externally.ll test to reflect that default mode should be
the same as the static relocation, by using the same check prefix.
Reviewed by: sfertile
Differential Revision: https://reviews.llvm.org/D70570
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 23c28c40436143006be740533375c036d11c92cd.
It caused build failures in the following expensive checks builders:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1295
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/700
Reverting for now whilst I figure what the issue is.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Ignore looking at blocks that are unreachable from entry when
collecting candidates for hosting.
Normally the consthoist pass is executed in the llc pipeline,
just after unreachableblockelim. So it is abnormal to have code
that is unreachable from the entry block. But when running the
pass as part of opt, for example as part of fuzzy testing, we
might trigger various kinds of asserts when collecting candidates
if we include unreachable blocks in that analysis.
It seems like a waste of time to hoist constants in unreachble
blocks, so the solution is to simply ignore such blocks when
collecting the hoisting candidates.
The two added test cases used to end up in two different asserts,
and the intention with the checks is just to verify that we no
longer fail.
Fixes: PR43903
Reviewers: spatel
Reviewed By: spatel
Subscribers: hiraditya, uabelho, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71678
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the following intrinsics:
* @llvm.aarch64.sve.clasta
* @llvm.aarch64.sve.clasta_n
* @llvm.aarch64.sve.clastb
* @llvm.aarch64.sve.clastb_n
* @llvm.aarch64.sve.compact
* @llvm.aarch64.sve.ext
* @llvm.aarch64.sve.lasta
* @llvm.aarch64.sve.lastb
* @llvm.aarch64.sve.rev
* @llvm.aarch64.sve.splice
* @llvm.aarch64.sve.tbl
* @llvm.aarch64.sve.trn1
* @llvm.aarch64.sve.trn2
* @llvm.aarch64.sve.uzp1
* @llvm.aarch64.sve.uzp2
* @llvm.aarch64.sve.zip1
* @llvm.aarch64.sve.zip2
Reviewers: sdesmalen, efriedma, dancgr, mgudim, huntergr, rengolin
Reviewed By: sdesmalen, efriedma
Subscribers: kmclaughlin, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71401
|
|
|
|
|
|
|
|
|
|
|
| |
The debug line verbose printing was printing the wrong values for rows
added via DW_LNE_end_sequence, because the row was being printed AFTER
its state had been reset following it being appended to the line table.
This patch fixes this issue by printing the row before appending it.
Reviewers: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D71664
|
|
|
|
|
|
|
|
|
|
|
| |
part 2.
That patch is extracted from the D70709. It moves CompileUnit, DeclContext
into llvm/DebugInfo/DWARF. It also adds new file DWARFOptimizer with
AddressesMap class. AddressesMap generalizes functionality
from RelocationManager.
Differential Revision: https://reviews.llvm.org/D71271
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In certain situations after inlining and simplification we end up with
code that is _almost_ a min/max pattern, but contains constants that
have been demand-bit optimised to the wrong values, ending up with code
like:
%1 = icmp slt i32 %shr, -128
%2 = select i1 %1, i32 128, i32 %shr
%.inv = icmp sgt i32 %shr, 127
%spec.select.i = select i1 %.inv, i32 127, i32 %2
%conv7 = trunc i32 %spec.select.i to i8
This should be turned into a min/max pattern, but the -128 in the first
select was instead transformed into 128, as only the bottom byte was
ever demanded.
To fix this, I've put in further canonicalisation for the immediates of
selects, preferring to use the same value as the icmp if available.
Differential Revision: https://reviews.llvm.org/D71516
|
|
|
|
|
|
|
|
| |
The inconsistency caused uops mode to fail on an older version of libpfm
since the dispatched_port was added as an alias for executed_port only
after v4.6.0 of libpfm.
Differential revision: https://reviews.llvm.org/D71665
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.
These improvements cover architectures implementing ARMv5TE or Thumb-2.
Reviewers: dmgreen, efriedma, john.brawn
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70072
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally
Reviewed By: cameron.mcinally
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71472
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: sdesmalen, eli.friedman, dancgr, mgudim, cameron.mcinally,
huntergr, efriedma
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71457
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D71676
|
|
|
|
|
|
| |
This patch is mainly for custom lowering the vector operation.
Differential Revision: https://reviews.llvm.org/D71592
|
|
|
|
|
|
|
|
| |
Fix a FIXME in ppcloopinstrformprep pass.
Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D71346
|
|
|
|
|
|
|
|
|
| |
Since the address pool doesn't get populated in this case (due to the
lack of inlining, no child DIEs are added to the CU - so no addresses
are needed for the DIEs themselves) until the range list is emitted - at
the time the attributes are added to the CU, the address pool is empty.
So check whether the address pool will be used for the range lists & add
an addr_base if that's the case.
|
|
|
|
| |
(found when LLVM fails to emit addr_base for gmlt+DWARFv5)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DebugLocStream::List"
Move these data structures closer together so their emission code can
eventually share more of its implementation.
Was an egregious bug (completely untested, evidently) where I hadn't
inverted a DWARFv5 test as needed, so it was doing the exact opposite of
what was required & thus tried to emit a DWARFv5 range list header in
DWARFv4.
Reapply 8e04896288d22ed8bef7ac367923374f96b753d6 which was
reverted in a8154e5e0c83d2f0f65f3b4fb1a0bc68785bd975.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The vector pattern `(a + b + 1) / 2` was previously selected to an
avgr_u instruction regardless of nuw flags, but this is incorrect in
the case where either addition may have an unsigned wrap. This CL
changes the existing pattern to require both adds to have nuw flags
and adds builtin functions and intrinsics for the avgr_u instructions
because the corrected pattern is not representable in C.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71648
|
|
|
|
|
|
|
|
| |
supplied."
This reverts commit 298e183e813c884dd22816383405bae3ef9ef278.
This commit caused some build failures -- reverting while I investigate.
|
|
|
|
|
|
|
|
| |
operations from being folded into masked instructions.
We really need to update the isel patterns to prevent this, but
that requires some tablegen de-tangling. So this hack will work
for correctness in the short term.
|
|
|
|
|
|
| |
LLJITBuilder will now use JITLink on supported platforms even if a custom
JITTargetMachineBuilder is supplied, provided that neither the code model,
nor the relocation model, nor the ObjectLinkingLayerCreator is set.
|
|
|
|
|
|
| |
Revert D70315, as it breaks gfx8 for some reason.
This reverts commit 65f94b33808d7d69539961a6f5a2168f0a1eef41.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add new intrinsics
llvm.experimental.constrained.minimum
llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.
Includes SystemZ back-end support.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71624
|
|
|
|
|
|
| |
Loop fusion previously had a method to check whether a loop was in rotated form. This method has
been moved into the LoopInfo class. This patch removes the old isRotated method from loop fusion,
in favour of the new one in LoopInfo.
|