| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:
- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
compare can raise exceptions and therefore needs to be a strict compare.
I've made it signaling (even though quiet would also be correct) as
signaling is the more usual default for an LT. This code exists both
in common code and in the X86 target.
- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
that ends up not chosen. I've fixed the algorithm to use only a single
STRICT_SINT_TO_FP instead.
- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
the wrong thing because it calls getOperationAction using the result VT.
But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
be called using the operand VT.
- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.
Reviewed by: craig.topper
Differential Revision: https://reviews.llvm.org/D71840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The only useful information the UndefValue conveys is the address space,
which MachinePointerInfo can represent directly without referring to an
IR value.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71838
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
extract_subvector(bitcast()) support
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.
Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0
Patch by: @RKSimon (Simon Pilgrim)
Differential Revision: https://reviews.llvm.org/D63815
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).
Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.
Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.
This matches what was done for X86 in 6bf108d77a3c.
Differential Revision: https://reviews.llvm.org/D71721
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).
Instead check the shouldAssumeDSOLocal method and load the address
from a COFF stub.
This matches what was done for X86 in 6bf108d77a3c.
Differential Revision: https://reviews.llvm.org/D71720
|
| |
|
|
|
|
| |
1. Remove duplicate function for class name at the beginning of the
comment.
2. Use auto where the type is already obvious from the context.
|
| |
|
|
|
|
|
|
|
| |
semantics of ISD::BSWAP
The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP.
We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse.
Differential Revision: https://reviews.llvm.org/D70657
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch introduces the ROLBRd and RORBRd pseudo-instructions,
which implemenent the "traditional" rotate operations; instead of
the AVR rotate instructions that use the carry bit.
The code is not optimized at all. Especially when dealing with
loops of rotate instructions, this codegen should be improved some
day.
Related bug: 41358 <https://bugs.llvm.org/show_bug.cgi?id=41358>
//Note//: This is my first submitted patch.
Reviewers: dylanmckay, Jim
Reviewed By: dylanmckay
Subscribers: hiraditya, llvm-commits, dylanmckay, dsprenkels
Tags: #llvm
Patched by dsprenkels (Daan Sprenkels)
Differential Revision: https://reviews.llvm.org/D60365
|
| |
|
|
|
|
|
|
|
| |
Summary:
Currently, we set legalization action of `ISD::ROTL` vectors as
`Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)`
to lower `ISD::ROTL` directly.
Differential Revision: https://reviews.llvm.org/D71324
|
| |
|
|
|
|
| |
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.
Differential Revision: https://reviews.llvm.org/D71815
|
| |
|
|
|
|
| |
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.
Differential Revision: https://reviews.llvm.org/D71814
|
| |
|
|
|
|
|
|
|
| |
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.
Also removed the top-level const as requested by Aaron Ballman in similar
patches.
Differential Revision: https://reviews.llvm.org/D71812
|
| |
|
|
|
|
| |
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.
Differential Revision: https://reviews.llvm.org/D71811
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
assembly.
Summary:
This is documented as the appropriate template modifier for call operands.
Fixes PR44272, and adds a regression test.
Also adds support for operand modifiers in Intel-style inline assembly.
Reviewers: rnk
Reviewed By: rnk
Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71677
|
| |
|
|
|
|
|
|
|
|
| |
This is another potential regression exposed by D63815.
Here we peek through a bitcast to find an extract subvector and
scale the splat offset based on that:
splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC'
Differential Revision: https://reviews.llvm.org/D71672
|
| | |
|
| |
|
|
|
|
| |
We cannot pick reserved registers as rename registers.
Fixes https://bugs.llvm.org/show_bug.cgi?id=44358
|
| |
|
|
|
|
|
|
|
|
|
| |
Confusingly, the intrinsic operands do not match the
instruction/custom node. The order is shuffled, and the 3rd operand is
an immediate to select operands.
I'm not 100% sure I did this right, but fdiv still doesn't select end
to end and it will be easier to tell when it does. This at least
avoids an assertion in RegBankSelect and allows hitting the fallback
on selection.
|
| | |
|
| |
|
|
|
| |
This can directly access the register bank, and doesn't need to get it
through the ID.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
operands locations
Extends DWARF expression language to express locals/globals locations. (via
target-index operands atm) (possible variants are: non-virtual registers
or address spaces)
The WebAssemblyExplicitLocals can replace virtual registers to targertindex
operand type at the time when WebAssembly backend introduces
{get,set,tee}_local instead of corresponding virtual registers.
Reviewed By: aprantl, dschuff
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D52634
|
| | |
|
| |
|
|
|
|
|
| |
Demote member functions to static functions where possible
Use early continue/early return to reduce nesting
Clarify comments slightly.
Reuse previously define expression in one case.
|
| |
|
|
| |
Should have caught this in review, but only noticed when addressing post commit style items. We were creating a new instance of the X86MCInstrInfo class, and then never reclaiming the memory. This wasn't even conditional on the new off by default flags, so it was an unconditional leak.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WARNING: If you're looking at this patch because you're looking for a full
performace mitigation of the Intel JCC Erratum, this is not it!
This is a preliminary patch on the patch towards mitigating the performance
regressions caused by Intel's microcode update for Jump Conditional Code
Erratum. For context, see:
https://www.intel.com/content/www/us/en/support/articles/000055650.html
The patch adds the required assembler infrastructure and command line options
needed to exercise the logic for INTERNAL TESTING. These are NOT public flags,
and should not be used for anything other than LLVM's own testing/debugging
purposes. They are likely to change both in spelling and meaning.
WARNING: This patch is knowingly incorrect in some cornercases. We need, and
do not yet provide, a mechanism to selective enable/disable the padding.
Conversation on this will continue in parellel with work on extending this
infrastructure to support prefix padding.
The goal here is to have the assembler align specific instructions such that
they neither cross or end at a 32 byte boundary. The impacted instructions are:
a. Conditional jump.
b. Fused conditional jump.
c. Unconditional jump.
d. Indirect jump.
e. Ret.
f. Call.
The new options for llvm-mc are:
-x86-align-branch-boundary=NUM aligns branches within NUM byte boundary.
-x86-align-branch=TYPE[+TYPE...] specifies types of branches to align.
A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit
NOP to align the fused/unfused branch.
alignBranchesBegin inserts MCBoundaryAlignFragment before instructions,
alignBranchesEnd marks the end of the branch to be aligned,
relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch.
Nop padding is disabled when the instruction may be rewritten by the linker,
such as TLS Call.
Process Note: I am landing a patch by skan as it has been LGTMed, and
continuing to iterate on the review is simply slowing us down at this point.
We can and will continue to iterate in tree.
Patch By: skan
Differential Revision: https://reviews.llvm.org/D70157
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
static void *ifunc(void) __attribute__((ifunc("resolver")));
void foo() { ifunc(); }
The relocation produced by the ifunc() call:
1. gcc -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
2. gcc -msecure-plt -PIE => R_PPC_PLTREL24 r_addend=0x8000
3. clang -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
4. clang -msecure-plt -fPIE => R_PPC_REL24
4 is incorrect. The R_PPC_REL24 needs a call stub due to ifunc. If this
relocation is mixed with other R_PPC_PLTREL24(r_addend=0x8000) in a
function, both GNU ld and lld (after D71621 fix) may produce a wrong
result.
This patch fixes 4 to use R_PPC_PLTREL24, which matches GCC.
Both GNU ld and lld (after D71621) will be happy.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D71649
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
before a later use.
The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code.
A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands.
To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe.
I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that.
Differential Revision: https://reviews.llvm.org/D71736
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics.
Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71614
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux)
instructions whenever one of the sources are the same reg as dest. By adding
this mapping in getTwoOperandOpcode(), we get:
- Two-address hints in getRegAllocationHints() for select register
instructions.
- No need anymore for special handling in SystemZShortenInst.cpp -
shortenSelect() removed.
The two-address hints are now added before the GRX32 hints, which should be
preferred.
Review: Ulrich Weigand
https://reviews.llvm.org/D68870
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was recently discovered that the handling of CC values was actually broken
since overflow was not properly handled ('nsw' flag not checked for).
Add and sub instructions now have a new target specific instruction flag
named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used
instead of a compare with 0, but only if the instruction has the 'nsw' flag
set.
This patch also adds the improvements of conversion to logical instructions
and the analyzing of add with immediates, to be able to eliminate more
compares.
Review: Ulrich Weigand
https://reviews.llvm.org/D66868
|
| |
|
|
| |
This reverts commit bbcf1c3496ce2bd1ed87e8fb15ad896e279633ce.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The back-end currently has special DAGCombine code to detect
cases where two floating-point extend or truncate operations
can be combined into a single vector operation.
This patch extends that support to also handle strict FP operations.
Note that currently only the case where both operations have the
same input chain are supported. This already suffices to cover
the common case where the operations result from scalarizing a
non-legal vector type. More general cases can be supported in
the future.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions
In general SVE intrinsics are considered predicated and merging
with everything else having suitable decoration. For predicated
zeroing operations (like the predicate logical instructions) we
use the "_z" suffix. After this change all intrinsics use their
expected names (i.e. orr instead of or and eor instead of xor).
I've removed intrinsics and patterns for condition code setting
instructions as that data is not returned as part of the intrinsic.
The expectation is to ask for a cc flag explicitly.
For example:
a = and_z(pg, p1, p2)
cc = ptest_<flag>(pg, a)
With the code generator expected to use "s" variants of instructions
when available.
Differential Revision: https://reviews.llvm.org/D71715
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
E.g.
%0 = tail call i64 @llvm.aarch64.sve.cntw(i32 31)
%mul = mul i64 %0, <const>
Should emit:
cntw x0, all, mul #<const>
For <const> in the range 1-16.
Patch by Kerry McLaughlin
Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71014
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The following intrnisics are added:
* @llvm.aarch64.sve.sqdec{b|h|w|d|p}
* @llvm.aarch64.sve.sqinc{b|h|w|d|p}
* @llvm.aarch64.sve.uqdec{b|h|w|d|p}
* @llvm.aarch64.sve.uqinc{b|h|w|d|p}
For every intrnisic there a scalar variants (with n32 or n64 suffix) and
vector variants (no suffix).
Reviewers: sdesmalen, rengolin, efriedma
Reviewed By: sdesmalen, efriedma
Subscribers: eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71252
|
| |
|
|
|
|
|
|
| |
Recommit 23c28c40436143006be740533375c036d11c92cd (reverted in
dcb48f50bdfa0fa47b62d089b6ed999d857fc9f8) with a fix for an assert
"Request for a fixed size on a scalable object" being triggered in
`LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the
TypeSize object.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The following intrinsics for binary narrowing shift righ operations are
added:
* @llvm.aarch64.sve.shrnb
* @llvm.aarch64.sve.uqshrnb
* @llvm.aarch64.sve.sqshrnb
* @llvm.aarch64.sve.sqshrunb
* @llvm.aarch64.sve.uqrshrnb
* @llvm.aarch64.sve.sqrshrnb
* @llvm.aarch64.sve.sqrshrunb
* @llvm.aarch64.sve.shrnt
* @llvm.aarch64.sve.uqshrnt
* @llvm.aarch64.sve.sqshrnt
* @llvm.aarch64.sve.sqshrunt
* @llvm.aarch64.sve.uqrshrnt
* @llvm.aarch64.sve.sqrshrnt
* @llvm.aarch64.sve.sqrshrunt
Reviewers: sdesmalen, rengolin, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71552
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Fix an issue with the incorrect value being used for the number of
elements being passed to [d|w]lstp. We were trying to check that
the value was available at LoopStart, but this doesn't consider
that the last instruction in the block could also define the
register. Two helpers have been added to RDA for this.
2) Insert some code to now try to move the element count def or the
insertion point so that we can perform more tail predication.
3) Related to (1), the same off-by-one could prevent us from
generating a low-overhead loop when a mov lr could have been
the last instruction in the block.
4) Fix up some instruction attributes so that not all the
low-overhead loop instructions are labelled as branches and
terminators - as this is not true for dls/dlstp.
Differential Revision: https://reviews.llvm.org/D71609
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Record the discovered VPT blocks while checking for validity and, for
now, only handle blocks that begin with VPST and not VPT. We're now
allowing more than one instruction to define vpr, but each block must
somehow be predicated using the vctp. This leaves us with several
scenarios which need fixing up:
1) A VPT block with is only predicated by the vctp and has no
internal vpr defs.
2) A VPT block which is only predicated by the vctp but has an
internal vpr def.
3) A VPT block which is predicated upon the vctp as well as another
vpr def.
4) A VPT block which is not predicated upon a vctp, but contains it
and all instructions within the block are predicated upon in.
The changes needed are, for:
1) The easy one, just remove the vpst and unpredicate the
instructions in the block.
2) Remove the vpst and unpredicate the instructions up to the
internal vpr def. Need insert a new vpst to predicate the
remaining instructions.
3) No nothing.
4) The vctp will be inside a vpt and the instruction will be removed,
so adjust the size of the mask on the vpst.
Differential Revision: https://reviews.llvm.org/D71107
|
| |
|
|
|
|
| |
Add VMULL and VQDMULL variants to our tail predication white list.
Differential Revision: https://reviews.llvm.org/D71465
|
| |
|
|
|
|
|
|
|
|
|
|
| |
for STRICT_FCMP. NFCI
The only thing its getting from the X86TargetLowering class is
the subtarget which we can easily pass. This function only has
one call site now since this might help the compiler inline it.
Explicitly return both the flag result and the chain result for
STRICT_FCMP nodes. This removes an assumption in the caller that
getValue(1) is the right way to get the chain.
|
| |
|
|
|
|
|
|
|
| |
constant and calling EmitCmp. NFCI
EmitCmp will just immediately call EmitTest and discard the null
constant only to have EmitTest create it again if it doesn't fold.
So just skip all that and go directly to EmitTest.
|
| |
|
|
|
|
| |
Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all.
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
|
| |
|
|
|
|
| |
as it broke the aarch64 build.
This reverts commit bc7595d934b958ab481288d7b8e768fe5310be8f.
|
| |
|
|
| |
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
|
| |
|
|
| |
This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
|
| |
|
|
|
|
|
|
|
|
| |
Summary: Instead of crashing due to the `llvm_unreachable`, provide a proper
error when invalid fixups/relocations are encountered.
Reviewers: asb, lenary
Reviewed By: asb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71536
|
| |
|
|
|
|
|
| |
Emit the __mcount_loc section for all fentry calls.
Review: Ulrich Weigand
https://reviews.llvm.org/D71629
|
| |
|
|
|
|
|
|
|
|
| |
This patch enables the machine outliner for RISC-V and adds the
necessary logic for checking whether sequences can be safely outlined,
and describing how they should be outlined. Outlined functions are
called using the register t0 (x5) as the return address register, which
must be available for an occurrence of a sequence to be safely outlined.
Differential Revision: https://reviews.llvm.org/D66210
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The default static (non-PIC, non-PIE) model for 32-bit powerpc does not
use @PLT annotations and relocations in GCC. LLVM shouldn't use @PLT
annotations either, because it breaks secure-PLT linking with (some
versions of?) GNU LD.
Update the available-externally.ll test to reflect that default mode should be
the same as the static relocation, by using the same check prefix.
Reviewed by: sfertile
Differential Revision: https://reviews.llvm.org/D70570
|