| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
NFCI.
llvm-svn: 374878
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector MSA register classes to fprb, they are 128 bit wide.
MSA instructions use the same registers for both integer and floating
point operations. Therefore we only need to check for vector element
size during legalization or instruction selection.
Add helper function in MipsLegalizerInfo and switch to legalIf
LegalizeRuleSet to keep legalization rules compact since they depend
on MipsSubtarget and presence of MSA.
fprb is assigned to all vector operands.
Move selectLoadStoreOpCode to MipsInstructionSelector in order to
reduce number of arguments.
Differential Revision: https://reviews.llvm.org/D68867
llvm-svn: 374872
|
| |
|
|
|
|
|
|
|
|
| |
Check if size of operand LLT matches sizes of available register banks
before inspecting the opcode in order to reduce number of checks.
Factor commonly used pieces of code into functions.
Differential Revision: https://reviews.llvm.org/D68866
llvm-svn: 374870
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VBROADCAST when trying to share broadcasts.
The only things VBROADCAST_LOAD uses is an address and a chain
node. It has no vector inputs.
So if its a user of the source of another broadcast that could
only mean one of two things. The other broadcast is broadcasting
the address of the broadcast_load. Or the source is a load and
the use we're seeing is the chain result from that load. Neither
of these cases make sense to combine here.
This issue was reported post-commit r373871. Test case has not
been reduced yet.
llvm-svn: 374862
|
| |
|
|
|
|
|
|
|
|
|
|
| |
LLVM may annotate the function with fastcc if there has only one caller
and there're no other caller out of the module and the function is not
naked or contain variable arguments.
The fastcc functions could pass the arguments by the caller saved registers.
Differential Revision: https://reviews.llvm.org/D68559
llvm-svn: 374857
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The WebAssembly backend lowers fptoint instructions to a code sequence
that checks for overflow to avoid traps because fptoint is supposed to
be speculatable. These new builtins and intrinsics give users a way to
depend on the trapping semantics of the underlying instructions and
avoid the extra code generated normally.
Patch by coffee and tlively.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68902
llvm-svn: 374856
|
| |
|
|
|
|
|
|
|
|
|
|
| |
to vgatherpf/vscatterpf.
We need to encode bit 4 into the EVEX.V' bit. We do this right
for regular gather/scatter which use either MRMSrcMem or MRMDestMem
formats. The prefetches use MRM*m formats.
Fixes an issue recently added to PR36202.
llvm-svn: 374849
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Integrated assembler does not accept offset expressions surrounded by
parenthesis. Handle this case for GAS compability.
https://bugs.llvm.org/show_bug.cgi?id=43631
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68764
llvm-svn: 374832
|
| |
|
|
|
|
|
| |
Atomic load/store would have their setting of m0 handled twice, which
happened to be optimized out later.
llvm-svn: 374801
|
| |
|
|
|
|
|
|
|
| |
Also, amend constraints for non-sync variants that are no longer
available on sm_70+ with PTX6.4+.
Differential Revision: https://reviews.llvm.org/D68892
llvm-svn: 374790
|
| |
|
|
|
|
|
|
| |
Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it.
For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs.
llvm-svn: 374786
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The adds both VMOVNt and VMOVNb instruction selection from the appropriate
shuffles. We detect shuffle masks of the form:
0, N, 2, N+2, 4, N+4, ...
or
0, N+1, 2, N+3, 4, N+5, ...
ISel will also try the opposite patterns, with inputs reversed. These are
selected to VMOVNt and VMOVNb respectively.
Differential Revision: https://reviews.llvm.org/D68283
llvm-svn: 374781
|
| |
|
|
|
|
|
|
| |
Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have).
For targets supporting POPCNT, we provide overrides that assume 1cy costs.
llvm-svn: 374775
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Materialize accesses to SVE frame objects from SP or FP, whichever is
available and beneficial.
This patch still assumes the objects are pre-allocated. The automatic
layout of SVE objects within the stackframe will be added in a separate
patch.
Reviewers: greened, cameron.mcinally, efriedma, rengolin, thegameg, rovka
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D67749
llvm-svn: 374772
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values according to the divergence.'
Detailed description:
After https://reviews.llvm.org/D59990 submit several issues were discovered.
Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly.
Discovered issues were addressed in the following commits:
https://reviews.llvm.org/D67662
https://reviews.llvm.org/D67101
https://reviews.llvm.org/D63953
https://reviews.llvm.org/D63731
This change brings back AMDGPU specific changes.
Reviewed by: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D68635
llvm-svn: 374767
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces the following changes to the btver2 scheduling model:
- The number of micro opcodes for YMM loads and stores is now 2 (it was
incorrectly set to 1 for both aligned and misaligned loads/stores).
- Increased the number of AGU resource cycles for YMM loads and stores
to 2cy (instead of 1cy).
- Removed JFPU01 and JFPX from the list of resources consumed by pure
float/vector loads (no MMX).
I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those
are dispatched to the FPU but not really issues on JFPU01.
Differential Revision: https://reviews.llvm.org/D68871
llvm-svn: 374765
|
| |
|
|
|
|
|
|
|
| |
Add an extra parameter so the backend can take the alignment into
consideration.
Differential Revision: https://reviews.llvm.org/D68400
llvm-svn: 374763
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from the subtract directly during isel.
This prevents isel from emitting a TEST instruction that
optimizeCompareInstr will need to remove later.
In some of the modified tests, the SUB gets duplicated due to
the flags being needed in two places and being clobbered in
between. optimizeCompareInstr was able to optimize away the TEST
that was using the result of one of them, but optimizeCompareInstr
doesn't know to turn SUB into CMP after removing the TEST. It
only knows how to turn SUB into CMP if the result was already
dead.
With this change the TEST never exists, so optimizeCompareInstr
doesn't have to remove it. Then it can just turn the SUB into
CMP immediately.
Fixes PR43649.
llvm-svn: 374755
|
| |
|
|
|
|
|
|
| |
well as KnownZero.
We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all.
llvm-svn: 374732
|
| |
|
|
|
|
|
|
|
|
| |
This enables use of the saturating truncate instructions when the
result type is less than 128 bits. It also enables the use of
saturating truncate instructions on KNL when the input is less
than 512 bits. We can do this by widening the input and then
extracting the result.
llvm-svn: 374731
|
| |
|
|
|
|
| |
getTargetShuffleInputs with KnownUndef/Zero results.
llvm-svn: 374725
|
| |
|
|
|
|
| |
Adjust SimplifyDemandedVectorEltsForTargetNode to use the known elts masks instead of recomputing it locally.
llvm-svn: 374724
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in combineSelect.
This seems to improve std::midpoint code where we have a min and
a max with the same condition. If we split the setcc we can end
up with two compares if the one of the operands is a constant.
Since we aggressively canonicalize compares with constants.
For non-constants it can interfere with our ability to share
control flow if we need to expand cmovs into control flow.
I'm also not sure I understand this min/max canonicalization code.
The motivating case talks about comparing with 0. But we don't
check for 0 explicitly.
Removes one instruction from the codegen for PR43658.
llvm-svn: 374706
|
| |
|
|
|
|
| |
instructions with avx512.
llvm-svn: 374705
|
| |
|
|
| |
llvm-svn: 374674
|
| |
|
|
| |
llvm-svn: 374669
|
| |
|
|
| |
llvm-svn: 374668
|
| |
|
|
| |
llvm-svn: 374667
|
| |
|
|
|
|
| |
This should go away once D66004 has landed and we can simplify shuffle chains using demanded elts.
llvm-svn: 374658
|
| |
|
|
|
|
|
|
| |
I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2.
I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674.
llvm-svn: 374655
|
| |
|
|
|
|
|
|
|
|
|
| |
is the largest legal vector and the result type is at least 256 bits.
Since the input type is larger than 256-bits we'll need to some
concatenating to reassemble the results. The pack instructions
ability to concatenate while packing make this a shorter/faster
sequence.
llvm-svn: 374643
|
| |
|
|
|
|
| |
register
llvm-svn: 374641
|
| |
|
|
| |
llvm-svn: 374640
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The commit of rL374420 had various formatting issues, including lines
that exceed 80 columns. This patch applies `git clang-format` on the
changes from commit 13bd3ef40d8b1586f26a022e01b21e56c91e05bd.
It further adjusts a comment to clarify the domain of inputs upon which
a newly added function is meant to operate. The adjustment to the
comment was suggested in a post-commit comment on D68721 and discussed
off-list with @sfertile.
llvm-svn: 374635
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374634
|
| |
|
|
|
|
|
|
| |
We already did this for VTRUNCUS with a specific combination of
types. This extends this to VTRUNCS and handles any types where
a truncating store is legal.
llvm-svn: 374615
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This defaults to zero fi operand, but we do not expose it
anyway. Should we expose it later it needs to be added to
the pseudo.
This enables dpp combining on gfx10.
Differential Revision: https://reviews.llvm.org/D68888
llvm-svn: 374604
|
| |
|
|
| |
llvm-svn: 374599
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now assembler generates two consecutive `.4byte` directives to store
64-bit `li.d' operand. The first directive stores high 4-byte of the
value. The second directive stores low 4-byte of the value. But on
64-bit system we load this value at once and get wrong result if the
system is little-endian.
This patch fixes the bug. It stores the `li.d' operand as a single
8-byte value.
Differential Revision: https://reviews.llvm.org/D68778
llvm-svn: 374598
|
| |
|
|
|
|
|
|
|
|
| |
If `li.s` or `li.d` loads zero into a FPR, it's not necessary to load
zero into `at` GPR register and then move its value into a floating
point register. We can use as a source register the `zero / $0` one.
Differential Revision: https://reviews.llvm.org/D68777
llvm-svn: 374597
|
| |
|
|
|
|
|
|
|
|
|
| |
The target does just enough to be able to run llvm-exegesis in latency
mode for at least some opcodes.
Patch by Miloš Stojanović.
Differential Revision: https://reviews.llvm.org/D68649
llvm-svn: 374590
|
| |
|
|
| |
llvm-svn: 374579
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Implements the following arithmetic intrinsics:
- int_aarch64_sve_sdot
- int_aarch64_sve_sdot_lane
- int_aarch64_sve_udot
- int_aarch64_sve_udot_lane
This patch includes tests for the Subdivide4Argument type added by D67549
Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D67551
llvm-svn: 374566
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The AIX system assembler does not understand .zero, so we should prefer
emitting .space.
Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68815
llvm-svn: 374564
|
| |
|
|
|
|
|
|
|
|
|
|
| |
ds_[read/write]_addtid_b32
See https://bugs.llvm.org/show_bug.cgi?id=37941
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D68787
llvm-svn: 374561
|
| |
|
|
|
|
|
|
|
|
|
|
| |
buffer_atomic_[fcmpswap/fmin/fmax]*
See https://bugs.llvm.org/show_bug.cgi?id=28232
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D68788
llvm-svn: 374559
|
| |
|
|
|
|
|
|
|
|
| |
See https://bugs.llvm.org/show_bug.cgi?id=43524
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D68785
llvm-svn: 374557
|
| |
|
|
|
|
|
|
|
|
| |
See https://bugs.llvm.org/show_bug.cgi?id=43486
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D68350
llvm-svn: 374553
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a "double" (64-bit) value has zero low 32-bits, it's possible to load
such value into a GP/FP registers as an instruction immediate. But now
assembler loads only high 32-bits of the value.
For example, if a target register is GPR the `li.d $4, 1.0` instruction
converts into the `lui $4, 16368` one. As a result, we get `0x3FF00000`
in the register. While a correct representation of the `1.0` value is
`0x3FF0000000000000`. The patch fixes that.
Differential Revision: https://reviews.llvm.org/D68776
llvm-svn: 374544
|
| |
|
|
|
|
| |
Now that its used by isNegatibleForFree we should try to avoid costly deep recursion
llvm-svn: 374534
|