| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
These 3 variables cause quite a few warnings in the scan-build report on llvm.
llvm-svn: 361731
|
| |
|
|
|
|
|
| |
This was found/reduced from a fuzzer report:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14956
llvm-svn: 361729
|
| |
|
|
|
|
| |
doesn't trigger
llvm-svn: 361728
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than gating on "isSwitchDense" (resulting in necessesarily
sparse lookup tables even when they were generated), always run
this quite cheap transform.
This transform is useful not just for generating tables.
LowerSwitch also wants this: read LowerSwitch.cpp:257.
Be careful to not generate worse code, by introducing a
SubThreshold heuristic.
Instead of just sorting by signed, generalize the finding of the
best base.
And now that it is run unconditionally, do not replicate its
functionality in SwitchToLookupTable (which could use a Sub
when having a hole is smaller, hence the SubThreshold
heuristic located in a single place).
This simplifies SwitchToLookupTable, and fixes
some ugly corner cases due to the use of signed numbers,
such as a table containing i16 32768 and 32769, of which
32769 would be interpreted as -32768, and now the code thinks
the table is size 65536.
(We still use unconditional subtraction when building a single-register mask,
but I think this whole block should go when the more general sparse
map is added, which doesn't leave empty holes in the table.)
And the reason test4 and test5 did not trigger was documented wrong:
it was because they were not considered sufficiently "dense".
Also, fix generation of invalid LLVM-IR: shl by bit-width.
llvm-svn: 361727
|
| |
|
|
|
|
|
|
|
|
| |
and replace with an equilivent countTrailingZeros.
GCD is much more expensive than this, with repeated division.
This depends on D60823
llvm-svn: 361726
|
| |
|
|
|
|
|
|
|
| |
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().
(as well as __builtin_clzll())
llvm-svn: 361724
|
| |
|
|
|
|
|
|
| |
The implementation in ValueTracking and ConstantRange are equally
powerful, reuse the one in ConstantRange, which will make this easier
to extend.
llvm-svn: 361723
|
| |
|
|
|
|
|
|
|
| |
Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).
llvm-svn: 361721
|
| |
|
|
|
|
|
| |
Instead pass binary op and signedness. The extra enum only makes
things more complicated in this case.
llvm-svn: 361720
|
| |
|
|
|
|
|
|
| |
This adds a pattern for fma, similar to the float and double patterns.
Differential Revision: https://reviews.llvm.org/D62330
llvm-svn: 361719
|
| |
|
|
|
|
|
|
|
| |
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.
Differential Revision: https://reviews.llvm.org/D62326
llvm-svn: 361718
|
| |
|
|
|
|
|
|
|
| |
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.
Differential Revision: https://reviews.llvm.org/D62325
llvm-svn: 361717
|
| |
|
|
|
|
|
|
|
|
| |
original vector
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.
There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.
llvm-svn: 361716
|
| |
|
|
|
|
|
|
| |
This adds a pattern for the fabs intrinsic, the same as float and double.
Differential Revision: https://reviews.llvm.org/D62324
llvm-svn: 361715
|
| |
|
|
|
|
|
|
| |
This adds a pattern for the sqrt intrinsic, the same as float and double.
Differential Revision: https://reviews.llvm.org/D62322
llvm-svn: 361714
|
| |
|
|
|
|
|
|
| |
Promote fp16 frem operations on ARM to floats so they call fmodf.
Differential Revision: https://reviews.llvm.org/D62321
llvm-svn: 361713
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: PR41688
Reviewers: spatel, efriedma, craig.topper, hfinkel, reames
Reviewed By: hfinkel
Subscribers: javed.absar, dmgreen, fhahn, hfinkel, reames, nikic, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61409
llvm-svn: 361707
|
| |
|
|
|
|
|
|
| |
shift(build_vector(),C)
Commonly occurs in sign-extension cases
llvm-svn: 361706
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Allow for retrieving an object file corresponding to an architecture-specific slice in a Mach-O universal binary file.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60378
llvm-svn: 361705
|
| |
|
|
|
|
|
|
|
| |
If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.
Differential Revision: https://reviews.llvm.org/D62448
llvm-svn: 361704
|
| |
|
|
|
|
|
|
|
| |
Adds support for the uadd.sat family of intrinsics in LVI, based on
ConstantRange methods from D60946.
Differential Revision: https://reviews.llvm.org/D62447
llvm-svn: 361703
|
| |
|
|
|
|
|
|
|
|
| |
The guaranteed no-wrap region is never empty, it always contains at
least zero, so these optimizations don't ever apply.
To make this more obviously true, replace the conversative return
in makeGNWR with an assertion.
llvm-svn: 361698
|
| |
|
|
|
|
|
|
|
|
| |
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.
llvm-svn: 361696
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In LVI, calculate the range of extractvalue(op.with.overflow(%x, %y), 0)
as the range of op(%x, %y). This is mainly useful in conjunction with
D60650: If the result of the operation is extracted in a branch guarded
against overflow, then the value of %x will be appropriately constrained
and the result range of the operation will be calculated taking that
into account.
Differential Revision: https://reviews.llvm.org/D60656
llvm-svn: 361693
|
| |
|
|
| |
llvm-svn: 361692
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support LEA64_32r properly.
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.
This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.
One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.
Differential Revision: https://reviews.llvm.org/D61472
llvm-svn: 361691
|
| |
|
|
|
|
|
|
|
|
| |
models. Add 256-bit fp xor to sandybridge zero idioms
This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.
Differential Revision: https://reviews.llvm.org/D62360
llvm-svn: 361690
|
| |
|
|
|
|
|
|
|
|
| |
cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio
llvm-svn: 361688
|
| |
|
|
|
|
|
|
|
|
|
| |
This lead to errors when dumping binaries with v4 and v5 units linked
together (but could've also errored on v5 units that did/didn't use
str_offsets).
Also improves error handling and messages around invalid str_offsets
contributions.
llvm-svn: 361683
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a few places in getInstrMapping, we check if use/def instructions for the
instruction we're mapping have floating point constraints.
We can improve this check and reduce the number of copies in GISel-compiled code
if we make a couple observations:
- For a def instruction, it only matters if the def instruction must always
output a value stored on a FPR
- For a use instruction, it only matters if the use instruction must always
only take in values stored in FPRs
This adds two new functions:
- onlyUsesFP
- onlyDefinesFP
Then we can use those when we're checking the uses/defs instead.
Without this patch, the load, unmerge, store, and select in the added test
would have unnecessary copies.
Differential Revision: https://reviews.llvm.org/D62426
llvm-svn: 361679
|
| |
|
|
|
|
|
|
|
| |
Factor it out into a function, and replace places where we had the same check
with the new function.
Differential Revision: https://reviews.llvm.org/D62421
llvm-svn: 361677
|
| |
|
|
|
|
|
|
|
| |
This adds `-parent-recurse-depth` which limits the number of parent DIEs
being dumped.
Differential revision: https://reviews.llvm.org/D62359
llvm-svn: 361671
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary:dd
This patch implements call lowering for calls without parameters
on AIX as initial support.
Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma
Differential Revision: https://reviews.llvm.org/D61948
llvm-svn: 361669
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fcsel and csel instructions differ in only the register banks they work on.
So, they're entirely interchangeable otherwise.
With this in mind, this does two things:
- Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as
the outputs.
- Teach it to choose the best register bank mapping based off the constraints
of the inputs and outputs.
The "best" in this case means the one that requires the smallest number of
copies to properly emit a fcsel/csel.
For example, if the inputs are all already going to be on FPRs, we should
emit a fcsel, even if the output is a GPR. This costs one copy to produce the
result, but saves us from copying the inputs into GPRs.
Also update the regbank-select.mir to check that we end up with the right
select instruction.
Differential Revision: https://reviews.llvm.org/D62267
llvm-svn: 361665
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.
pr/41999
Reviewers: t.p.northover, peter.smith
Reviewed By: peter.smith
Subscribers: craig.topper, javed.absar, kristof.beyls, hiraditya, llvm-commits, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62402
llvm-svn: 361661
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We were observing failures for arm32 allyesconfigs of the Linux kernel
with the asm goto Clang patch, where ldr's were being generated to
offsets too far away to encode in imm12.
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.
pr/41999
Link: https://github.com/ClangBuiltLinux/linux/issues/490
Reviewers: peter.smith, kristof.beyls, ostannard, rengolin, t.p.northover
Reviewed By: peter.smith
Subscribers: jyu2, javed.absar, hiraditya, llvm-commits, nathanchance, craig.topper, kees, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62400
llvm-svn: 361659
|
| |
|
|
|
|
|
| |
If some lanes weren't active on entry to the function, this could
clobber their VGPR values.
llvm-svn: 361655
|
| |
|
|
|
|
|
| |
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.
llvm-svn: 361649
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
llvm-svn: 361644
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the situation, where we generate the following code:
crxor 8, 8, 8
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 8, 8
cror (COPY of CRbit) depends on the result of the crxor instruction.
CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use
crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on
any previous instruction.
This patch will optimize it to:
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 1, 1
Patch By: Victor Huang (NeHuang)
Differential Revision: https://reviews.llvm.org/D62044
llvm-svn: 361632
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the SVE2 character match instructions MATCH and NMATCH.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62206
llvm-svn: 361627
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 bitwise shift right narrow:
* SQSHRUNB, SQSHRUNT, SQRSHRUNB, SQRSHRUNT, SHRNB, SHRNT, RSHRNB, RSHRNT,
SQSHRNB, SQSHRNT, SQRSHRNB, SQRSHRNT, UQSHRNB, UQSHRNT, UQRSHRNB,
UQRSHRNT
SVE2 integer add/subtract narrow high part:
* ADDHNB, ADDHNT, RADDHNB, RADDHNT, SUBHNB, SUBHNT, RSUBHNB, RSUBHNT
SVE2 saturating extract narrow:
* SQXTNB, SQXTNT, UQXTNB, UQXTNT, SQXTUNB, SQXTUNT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62205
llvm-svn: 361624
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 bitwise shift and insert:
* SRI, SLI
SVE2 bitwise shift right and accumulate:
* SSRA, USRA, SRSRA, URSRA
SVE2 complex integer add:
* CADD, SQCADD
SVE2 integer absolute difference and accumulate:
* SABA, UABA
SVE2 integer absolute difference and accumulate long:
* SABALB, SABALT, UABALB, UABALT
SVE2 integer add/subtract long with carry:
* ADCLB, ADCLT, SBCLB, SBCLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62204
llvm-svn: 361622
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Differential Revision: https://reviews.llvm.org/D61887
llvm-svn: 361620
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds support for the polynomial multiplication instructions
PMULLB/PMULLT. The 64-bit source and 128-bit destination element
variants are enabled with crypto extensions (+sve2-aes), similar to the
NEON PMULL2 instruction. All other variants are enabled with +sve2.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62145
llvm-svn: 361619
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 integer add/subtract long:
* SADDLB, SADDLT, UADDLB, UADDLT, SSUBLB, SSUBLT, USUBLB, USUBLT,
SABDLB, SABDLT, UABDLB, UABDLT
SVE2 integer add/subtract wide:
* SADDWB, SADDWT, UADDWB, UADDWT, SSUBWB, SSUBWT, USUBWB, USUBWT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62142
llvm-svn: 361615
|
| |
|
|
|
|
|
|
|
| |
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.
llvm-svn: 361613
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds support for the SVE2 saturating/rounding bitwise shift
left (predicated) group of instructions:
* SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL,
SQSHLR, UQSHLR, SQRSHLR, UQRSHLR
Immediate forms of the SQSHL and UQSHL instructions are also added to
the existing SVE bitwise shift by immediate (predicated) group, as well
as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in
this group are encoded similarly and are implemented using the same
TableGen class with a minimal change (1 bit in encoding).
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62140
llvm-svn: 361612
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
* SQADD, UQADD, SUQADD, USQADD
* SQSUB, UQSUB, SQSUBR, UQSUBR
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62130
llvm-svn: 361611
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change relaxes the checks for hasOnlyUniformBranches such that our
region is uniform if:
1. All conditional branches that are direct children are uniform.
2. And either:
a. All sub-regions are uniform.
b. There is one or less conditional branches among the direct
children.
Differential Revision: https://reviews.llvm.org/D62198
llvm-svn: 361610
|