| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
This reverts commit d5a8637072f4c556b88156bd2f6237a2ead47d31.
Most likely suspect for this bot failure:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684
llvm-svn: 361850
|
| |
|
|
|
|
|
|
| |
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.
llvm-svn: 361848
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
Otherwise, that promotes that scalar register into vector one, which
breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
that (lane mask) scalar register is legalized firstly before its
definition, e.g., due to the mismatch block placement and its
topological order or loop. In that cases, the legalization of PHI
introduces the use of that scalar register as `vreg_1`.
Reviewers: rampitec, nhaehnle, arsenm, alex-t
Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62492
llvm-svn: 361847
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Those two subtarget features were awkward because their semantics are
reversed: each one indicates the _lack_ of support for something in
the architecture, rather than the presence. As a consequence, you
don't get the behavior you want if you combine two sets of feature
bits.
Each SubtargetFeature for an FP architecture version now comes in four
versions, one for each combination of those options. So you can still
say (for example) '+vfp2' in a feature string and it will mean what
it's always meant, but there's a new string '+vfp2d16sp' meaning the
version without those extra options.
A lot of this change is just mechanically replacing positive checks
for the old features with negative checks for the new ones. But one
more interesting change is that I've rearranged getFPUFeatures() so
that the main FPU feature is appended to the output list *before*
rather than after the features derived from the Restriction field, so
that -fp64 and -d32 can override defaults added by the main feature.
Reviewers: dmgreen, samparker, SjoerdMeijer
Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D60691
llvm-svn: 361845
|
| |
|
|
| |
llvm-svn: 361844
|
| |
|
|
|
|
|
|
|
|
| |
If we don't have VLX then 256-bit SET0 should be lowered
to VPXOR with ZMM registers. This restores functionality
accidentally removed by r309926.
Differential Revision: https://reviews.llvm.org/D62415
llvm-svn: 361843
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3
But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.
We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.
Differential Revision: https://reviews.llvm.org/D62498
llvm-svn: 361822
|
| |
|
|
|
|
|
|
|
|
|
| |
Forking this out of the discussion in D62498
(and assuming that will be committed later, so adding the helper function here).
The LangRef says:
"the backend should never split or merge target-legal volatile load/store instructions."
Differential Revision: https://reviews.llvm.org/D62506
llvm-svn: 361815
|
| |
|
|
|
|
|
| |
The generic legalizer cannot handle this. Add an assert instead of
silently miscompiling vectors with elements smaller than 8 bits.
llvm-svn: 361814
|
| |
|
|
| |
llvm-svn: 361813
|
| |
|
|
|
|
| |
variable warnings. NFCI.
llvm-svn: 361804
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following intructions:
SVE2 floating-point convert precision:
* FCVTXNT, FCVTNT, FCVTLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62382
llvm-svn: 361801
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 crypto constructive binary operations:
* SM4EKEY, RAX1
SVE2 crypto destructive binary operations:
* AESE, AESD, SM4E
SVE2 crypto unary operations:
* AESMC, AESIMC
AESE, AESD, AESMC and AESIMC are enabled with +sve2-aes. SM4E and
SM4EKEY are enabled with +sve2-sm4. RAX1 is enabled with +sve2-sha3.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62307
llvm-svn: 361797
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 histogram generation (segment):
* HISTSEG
SVE2 histogram generation (vector):
* HISTCNT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62306
llvm-svn: 361796
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 bitwise exclusive-or interleaved:
* EORBT, EORTB
SVE2 bitwise permute:
* BEXT, BDEP, BGRP
SVE2 bitwise shift left long:
* SSHLLB, SSHLLT, USHLLB, USHLLT
SVE2 integer add/subtract interleaved long:
* SADDLBT, SSUBLBT, SSUBLTB
BDEP, BEXT and BGRP are enabled with SVE2 feature +bitperm, all other
instructions in this group are enabled with +sve2.
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62304
llvm-svn: 361795
|
| |
|
|
| |
llvm-svn: 361776
|
| |
|
|
|
|
|
|
|
|
| |
AArch64AsmBackend.cpp was not using any APIs from AArch64.h, and was
only including it for transitive dependencies. Doing so is problematic
from include-what-you-use perspective, but it is also a layering issue
(it creates a dependency cycle between the primary AArch64 target
library and the MCTargetDesc library).
llvm-svn: 361774
|
| |
|
|
|
|
|
|
| |
commit:
1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5 Divergence driven ISel. Assign register class for cross block values according to the divergence.
llvm-svn: 361770
|
| |
|
|
|
|
|
|
|
|
| |
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D61017
llvm-svn: 361763
|
| |
|
|
|
|
|
| |
We never actually use the Offsets produced by ComputeValueVTs, so remove
them until we need them.
llvm-svn: 361755
|
| |
|
|
|
|
|
|
|
|
|
| |
The variables in BTF DataSec type encode in-section offset.
R_BPF_NONE should be generated instead of R_BPF_64_32.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D62460
llvm-svn: 361742
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
This commit was reverted because of the build failure.
The reason was mlformed patch.
Build failure fixed.
llvm-svn: 361741
|
| |
|
|
|
|
|
|
| |
They caused the sanitizer builds to fail.
My suspicion is the change the countLeadingZeros().
llvm-svn: 361736
|
| |
|
|
|
|
| |
Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel
llvm-svn: 361734
|
| |
|
|
|
|
|
|
|
| |
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().
(as well as __builtin_clzll())
llvm-svn: 361724
|
| |
|
|
|
|
|
|
| |
This adds a pattern for fma, similar to the float and double patterns.
Differential Revision: https://reviews.llvm.org/D62330
llvm-svn: 361719
|
| |
|
|
|
|
|
|
|
| |
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.
Differential Revision: https://reviews.llvm.org/D62326
llvm-svn: 361718
|
| |
|
|
|
|
|
|
|
| |
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.
Differential Revision: https://reviews.llvm.org/D62325
llvm-svn: 361717
|
| |
|
|
|
|
|
|
|
|
| |
original vector
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.
There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.
llvm-svn: 361716
|
| |
|
|
|
|
|
|
| |
This adds a pattern for the fabs intrinsic, the same as float and double.
Differential Revision: https://reviews.llvm.org/D62324
llvm-svn: 361715
|
| |
|
|
|
|
|
|
| |
This adds a pattern for the sqrt intrinsic, the same as float and double.
Differential Revision: https://reviews.llvm.org/D62322
llvm-svn: 361714
|
| |
|
|
|
|
|
|
| |
Promote fp16 frem operations on ARM to floats so they call fmodf.
Differential Revision: https://reviews.llvm.org/D62321
llvm-svn: 361713
|
| |
|
|
|
|
|
|
| |
shift(build_vector(),C)
Commonly occurs in sign-extension cases
llvm-svn: 361706
|
| |
|
|
|
|
|
|
|
| |
If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.
Differential Revision: https://reviews.llvm.org/D62448
llvm-svn: 361704
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support LEA64_32r properly.
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.
This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.
One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.
Differential Revision: https://reviews.llvm.org/D61472
llvm-svn: 361691
|
| |
|
|
|
|
|
|
|
|
| |
models. Add 256-bit fp xor to sandybridge zero idioms
This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.
Differential Revision: https://reviews.llvm.org/D62360
llvm-svn: 361690
|
| |
|
|
|
|
|
|
|
|
| |
cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio
llvm-svn: 361688
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a few places in getInstrMapping, we check if use/def instructions for the
instruction we're mapping have floating point constraints.
We can improve this check and reduce the number of copies in GISel-compiled code
if we make a couple observations:
- For a def instruction, it only matters if the def instruction must always
output a value stored on a FPR
- For a use instruction, it only matters if the use instruction must always
only take in values stored in FPRs
This adds two new functions:
- onlyUsesFP
- onlyDefinesFP
Then we can use those when we're checking the uses/defs instead.
Without this patch, the load, unmerge, store, and select in the added test
would have unnecessary copies.
Differential Revision: https://reviews.llvm.org/D62426
llvm-svn: 361679
|
| |
|
|
|
|
|
|
|
| |
Factor it out into a function, and replace places where we had the same check
with the new function.
Differential Revision: https://reviews.llvm.org/D62421
llvm-svn: 361677
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary:dd
This patch implements call lowering for calls without parameters
on AIX as initial support.
Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma
Differential Revision: https://reviews.llvm.org/D61948
llvm-svn: 361669
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fcsel and csel instructions differ in only the register banks they work on.
So, they're entirely interchangeable otherwise.
With this in mind, this does two things:
- Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as
the outputs.
- Teach it to choose the best register bank mapping based off the constraints
of the inputs and outputs.
The "best" in this case means the one that requires the smallest number of
copies to properly emit a fcsel/csel.
For example, if the inputs are all already going to be on FPRs, we should
emit a fcsel, even if the output is a GPR. This costs one copy to produce the
result, but saves us from copying the inputs into GPRs.
Also update the regbank-select.mir to check that we end up with the right
select instruction.
Differential Revision: https://reviews.llvm.org/D62267
llvm-svn: 361665
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.
pr/41999
Reviewers: t.p.northover, peter.smith
Reviewed By: peter.smith
Subscribers: craig.topper, javed.absar, kristof.beyls, hiraditya, llvm-commits, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62402
llvm-svn: 361661
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We were observing failures for arm32 allyesconfigs of the Linux kernel
with the asm goto Clang patch, where ldr's were being generated to
offsets too far away to encode in imm12.
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.
pr/41999
Link: https://github.com/ClangBuiltLinux/linux/issues/490
Reviewers: peter.smith, kristof.beyls, ostannard, rengolin, t.p.northover
Reviewed By: peter.smith
Subscribers: jyu2, javed.absar, hiraditya, llvm-commits, nathanchance, craig.topper, kees, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62400
llvm-svn: 361659
|
| |
|
|
|
|
|
| |
If some lanes weren't active on entry to the function, this could
clobber their VGPR values.
llvm-svn: 361655
|
| |
|
|
|
|
|
| |
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.
llvm-svn: 361649
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
llvm-svn: 361644
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the situation, where we generate the following code:
crxor 8, 8, 8
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 8, 8
cror (COPY of CRbit) depends on the result of the crxor instruction.
CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use
crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on
any previous instruction.
This patch will optimize it to:
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 1, 1
Patch By: Victor Huang (NeHuang)
Differential Revision: https://reviews.llvm.org/D62044
llvm-svn: 361632
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the SVE2 character match instructions MATCH and NMATCH.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62206
llvm-svn: 361627
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 bitwise shift right narrow:
* SQSHRUNB, SQSHRUNT, SQRSHRUNB, SQRSHRUNT, SHRNB, SHRNT, RSHRNB, RSHRNT,
SQSHRNB, SQSHRNT, SQRSHRNB, SQRSHRNT, UQSHRNB, UQSHRNT, UQRSHRNB,
UQRSHRNT
SVE2 integer add/subtract narrow high part:
* ADDHNB, ADDHNT, RADDHNB, RADDHNT, SUBHNB, SUBHNT, RSUBHNB, RSUBHNT
SVE2 saturating extract narrow:
* SQXTNB, SQXTNT, UQXTNB, UQXTNT, SQXTUNB, SQXTUNT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62205
llvm-svn: 361624
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 bitwise shift and insert:
* SRI, SLI
SVE2 bitwise shift right and accumulate:
* SSRA, USRA, SRSRA, URSRA
SVE2 complex integer add:
* CADD, SQCADD
SVE2 integer absolute difference and accumulate:
* SABA, UABA
SVE2 integer absolute difference and accumulate long:
* SABALB, SABALT, UABALB, UABALT
SVE2 integer add/subtract long with carry:
* ADCLB, ADCLT, SBCLB, SBCLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62204
llvm-svn: 361622
|