| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
return instruction.
Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding
the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class
exclusive of the CSRs, and used this regclass while lowering the return instruction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D63924
llvm-svn: 365512
|
|
|
|
|
|
|
|
|
|
|
| |
Mostsly these would fail due to trying to use SI with a flat
operation. Implementing global loads with MUBUF is more work than
flat, so these won't be handled in the initial load selection.
Others fail because store of s64 won't initially work, as the current
set of patterns expect everything to be turned into v2i32.
llvm-svn: 365493
|
|
|
|
| |
llvm-svn: 365491
|
|
|
|
| |
llvm-svn: 365488
|
|
|
|
|
|
| |
Account for 64-bit scalar eq/ne when available.
llvm-svn: 365487
|
|
|
|
| |
llvm-svn: 365486
|
|
|
|
| |
llvm-svn: 365484
|
|
|
|
| |
llvm-svn: 365483
|
|
|
|
| |
llvm-svn: 365482
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64369
llvm-svn: 365431
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the FP register callee saved.
This is tricky because now the FP needs to be spilled in the prolog
relative to the incoming SP register, rather than the frame register
used throughout the rest of the function. I don't like how this
bypassess the standard mechanism for CSR spills just to get the
correct insert point. I may look for a better solution, since all CSR
VGPRs may also need to have all lanes activated. Another option might
be to make getFrameIndexReference change the base register if the
frame index is a CSR, and then try to figure out the right insertion
point in emitProlog.
If there is a free VGPR lane available for SGPR spilling, try to use
it for the FP. If that would require intrtoducing a new VGPR spill,
try to use a free call clobbered SGPR. Only fallback to introducing a
new VGPR spill as a last resort.
This also doesn't attempt to handle SGPR spilling with scalar stores.
llvm-svn: 365372
|
|
|
|
|
|
|
|
|
|
| |
This is extremly slow on AMDGPU, which has a lot of physical register
and a lot of register classes.
determineCalleeSaves, via MachineRegisterInfo::isPhysRegUsed already
added all of the super registers to the saved set.
llvm-svn: 365370
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a function attribute, nofree, to indicate that a function does
not, directly or indirectly, call a memory-deallocation function (e.g., free,
C++'s operator delete).
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D49165
llvm-svn: 365336
|
|
|
|
|
|
|
|
| |
Patch by Christudasan Devadasan.
Differential Revision: https://reviews.llvm.org/D63886
llvm-svn: 365217
|
|
|
|
|
|
|
|
|
|
| |
When looking for uses/defs to add kill flags, the iterator was double
incremented, skipping the first instruction in the bundle. The use
register in the first bundle instruction was then incorrectly killed.
The "First" instruction should be the BUNDLE itself as the proper
reverse iterator endpoint.
llvm-svn: 365216
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This allows the DPP combiner to kick in more often. For example the
exclusive scan generated by the atomic optimizer for a divergent atomic
add used to look like this:
v_mov_b32_e32 v3, v1
v_mov_b32_e32 v5, v1
v_mov_b32_e32 v6, v1
v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
s_nop 1
v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v6, v3 row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v3, v4, v5, v6
v_mov_b32_e32 v4, v1
s_nop 1
v_mov_b32_dpp v4, v3 row_shr:4 row_mask:0xf bank_mask:0xe
v_add_u32_e32 v3, v3, v4
v_mov_b32_e32 v4, v1
s_nop 1
v_mov_b32_dpp v4, v3 row_shr:8 row_mask:0xf bank_mask:0xc
v_add_u32_e32 v3, v3, v4
v_mov_b32_e32 v4, v1
s_nop 1
v_mov_b32_dpp v4, v3 row_bcast:15 row_mask:0xa bank_mask:0xf
v_add_u32_e32 v3, v3, v4
s_nop 1
v_mov_b32_dpp v1, v3 row_bcast:31 row_mask:0xc bank_mask:0xf
v_add_u32_e32 v1, v3, v1
v_add_u32_e32 v1, v2, v1
v_readlane_b32 s0, v1, 63
But now most of the dpp movs are combined into adds:
v_mov_b32_e32 v3, v1
v_mov_b32_e32 v5, v1
s_nop 0
v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
s_nop 1
v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v1, v4, v5, v1
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
v_add_u32_e32 v1, v2, v1
v_readlane_b32 s0, v1, 63
Reviewers: arsenm, vpykhtin
Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64207
llvm-svn: 365211
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these
sizes has been marked "expand", which made LegalizeDAG lower it to loads
and stores via a stack slot. The code got optimized a bit later, but the
now-unused stack slot was never deleted.
This commit avoids that problem by custom lowering INSERT_SUBVECTOR into
an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the
subvector to insert.
V2: Addressed review comments re test.
Differential Revision: https://reviews.llvm.org/D63160
Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1
llvm-svn: 365148
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: - That flag setting should skip spilling stack slot.
Reviewers: arsenm, rampitec
Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64143
llvm-svn: 365137
|
|
|
|
| |
llvm-svn: 365093
|
|
|
|
|
|
|
|
| |
This reverts commit r365073.
This is crashing, and is improperly relying on IR type names.
llvm-svn: 365087
|
|
|
|
|
|
|
|
|
| |
Summary:
Hip texture type is equivalent to OpenCL image. So, we need to set the Image type for kernel arguments with __hip_texture type.
Differential revision: https://reviews.llvm.org/D63850
llvm-svn: 365073
|
|
|
|
|
|
|
|
| |
When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available.
Differential revision: https://reviews.llvm.org/D64131
llvm-svn: 365043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Support serialization of all arguments in machine function info. This
enables fabricating MIR tests depending on argument info.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64096
llvm-svn: 364995
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
immediate forms.
There are two main issues preventing us from generating immediate form shifts:
1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift
immediate forms, but they currently don't work because the amount type is
expected to be an s64 constant, but we only legalize them to have homogenous
types.
To deal with this, first we introduce a custom legalizer to *only* custom legalize
s32 shifts which have a constant operand into a s64.
There is also an additional artifact combiner to fold zexts(g_constant) to a
larger G_CONSTANT if it's legal, a counterpart to the anyext version committed
in an earlier patch.
2) For G_SHL the importer can't cope with the pattern. For this I introduced an
early selection phase in the arm64 selector to select these forms manually
before the tablegen selector pessimizes it to a register-register variant.
Differential Revision: https://reviews.llvm.org/D63910
llvm-svn: 364994
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BUNDLE itself should not have side effects, and this is a property
of instructions inside the bundle. The hasProperty check already
searches for any member instructions, which was pointless since it was
overridden by this bit.
Allows me to distinguish bundles that have side effects vs. do not in
a future patch. Also fixes an unnecessary scheduling barrier in the
bundle AMDGPU uses to get PC relative addresses.
llvm-svn: 364984
|
|
|
|
|
|
| |
These aren't produced now, but will be in a future patch.
llvm-svn: 364983
|
|
|
|
|
|
|
|
|
| |
Ordinarily it is lowered as a build_vector of each extract_vector_elt,
which in turn get lowered to bitcasts and bit shifts. Very little
understand the lowered extract pattern, resulting in much worse
code. We treat concat_vectors of v2i16 as legal, so prefer that.
llvm-svn: 364959
|
|
|
|
|
|
|
|
|
| |
divergent loop and used outside
Differential Revision: https://reviews.llvm.org/D63953
Reviewers: rampitec, nhaehnle, arsenm
llvm-svn: 364950
|
|
|
|
| |
llvm-svn: 364935
|
|
|
|
| |
llvm-svn: 364933
|
|
|
|
| |
llvm-svn: 364932
|
|
|
|
| |
llvm-svn: 364931
|
|
|
|
|
|
|
|
| |
The register bank for the destination of the sample argument copy was
wrong. We shouldn't be constraining each source to the result register
bank. Allow constraining the original register to the right size.
llvm-svn: 364928
|
|
|
|
|
|
|
| |
Manually select to workaround tablegen emitter emitting checks for
G_CONSTANT.
llvm-svn: 364927
|
|
|
|
|
|
|
| |
The pattern importer is for some reason emitting checks for G_CONSTANT
for the immediate operands.
llvm-svn: 364926
|
|
|
|
|
|
|
| |
These should be SALU writes, and these are lowered to instructions
that def SCC.
llvm-svn: 364859
|
|
|
|
|
|
|
|
| |
If the requested source type an be used as a merge source type, create
a merge of merges. This avoids creating large, illegal extensions and
bit-ops directly to the result type.
llvm-svn: 364841
|
|
|
|
| |
llvm-svn: 364839
|
|
|
|
| |
llvm-svn: 364836
|
|
|
|
| |
llvm-svn: 364835
|
|
|
|
| |
llvm-svn: 364834
|
|
|
|
|
|
|
|
|
| |
Tests don't cover the masked input path since non-kernel arguments
aren't lowered yet.
Test is copied directly from the existing test, with 2 additions.
llvm-svn: 364833
|
|
|
|
|
|
|
|
| |
Replace the brcond for the 2 cases that act as branches. For now
follow how the current system works, although I think we can
eventually get rid of the pseudos.
llvm-svn: 364832
|
|
|
|
|
|
|
|
|
| |
This needs to be extended to s32, and expanded into cmp+select. This
is relying on the fact that widenScalar happens to leave the
instruction in place, but this isn't a guaranteed property of
LegalizerHelper.
llvm-svn: 364831
|
|
|
|
|
|
|
| |
Use a change observer to apply a register bank to the newly created
intermediate result register.
llvm-svn: 364830
|
|
|
|
| |
llvm-svn: 364828
|
|
|
|
|
|
|
| |
If this is scalar, promote to s32. Use a new observer class to assign
the register bank of newly created registers.
llvm-svn: 364827
|
|
|
|
|
|
|
|
|
| |
The condition register bank must be scc or vcc so that a copy will be
inserted, which will be lowered to a compare.
Currently greedy unnecessarily forces using a VCC select.
llvm-svn: 364825
|
|
|
|
| |
llvm-svn: 364819
|
|
|
|
| |
llvm-svn: 364817
|