| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
Also make it an ImmLeaf, so it should work with global isel as well,
which was part of the point of moving it in the first place.
llvm-svn: 365842
|
|
|
|
|
|
| |
Instruction was used after it was erased.
llvm-svn: 365837
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64594
llvm-svn: 365833
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64593
llvm-svn: 365829
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64590
llvm-svn: 365826
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64584
llvm-svn: 365824
|
|
|
|
| |
llvm-svn: 365782
|
|
|
|
| |
llvm-svn: 365741
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64435
llvm-svn: 365717
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64431
llvm-svn: 365715
|
|
|
|
| |
llvm-svn: 365658
|
|
|
|
| |
llvm-svn: 365653
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
D59191 added support for these modifiers in the assembler and
disassembler. This patch just teaches instruction selection that it can
use them.
Reviewers: arsenm, tstellar
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64497
llvm-svn: 365640
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This adds support for the most commonly used wide load types:
<8xi32>, <16xi32>, <4xi64>, and <8xi64>
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57399
llvm-svn: 365586
|
|
|
|
|
|
|
|
|
| |
In SelectionDAG AMDGPU treated these as legal, but this was mostly
because the bitcasts required for FP types were painful. Theoretically
the bitpattern should eventually match to bfi, so don't bother trying
to get the patterns to import.
llvm-svn: 365583
|
|
|
|
| |
llvm-svn: 365575
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64433
llvm-svn: 365573
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64446
llvm-svn: 365563
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.
This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.
Differential Revision: https://reviews.llvm.org/D64295
llvm-svn: 365549
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64438
llvm-svn: 365546
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64429
llvm-svn: 365525
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
return instruction.
Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding
the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class
exclusive of the CSRs, and used this regclass while lowering the return instruction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D63924
llvm-svn: 365512
|
|
|
|
| |
llvm-svn: 365488
|
|
|
|
|
|
| |
Account for 64-bit scalar eq/ne when available.
llvm-svn: 365487
|
|
|
|
| |
llvm-svn: 365486
|
|
|
|
| |
llvm-svn: 365484
|
|
|
|
| |
llvm-svn: 365483
|
|
|
|
| |
llvm-svn: 365482
|
|
|
|
|
|
|
|
| |
Infrastructure work for future commit. NFC.
Differential Revision: https://reviews.llvm.org/D64370
llvm-svn: 365432
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64369
llvm-svn: 365431
|
|
|
|
|
|
|
| |
This will help removing the custom load predicates, allowing the
global isel emitter to handle them.
llvm-svn: 365398
|
|
|
|
| |
llvm-svn: 365394
|
|
|
|
| |
llvm-svn: 365378
|
|
|
|
| |
llvm-svn: 365373
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the FP register callee saved.
This is tricky because now the FP needs to be spilled in the prolog
relative to the incoming SP register, rather than the frame register
used throughout the rest of the function. I don't like how this
bypassess the standard mechanism for CSR spills just to get the
correct insert point. I may look for a better solution, since all CSR
VGPRs may also need to have all lanes activated. Another option might
be to make getFrameIndexReference change the base register if the
frame index is a CSR, and then try to figure out the right insertion
point in emitProlog.
If there is a free VGPR lane available for SGPR spilling, try to use
it for the FP. If that would require intrtoducing a new VGPR spill,
try to use a free call clobbered SGPR. Only fallback to introducing a
new VGPR spill as a last resort.
This also doesn't attempt to handle SGPR spilling with scalar stores.
llvm-svn: 365372
|
|
|
|
| |
llvm-svn: 365369
|
|
|
|
|
|
|
| |
These are identical to the *_global PatFrag, and will only create more
work to get the GlobalISel importer to handle them.
llvm-svn: 365350
|
|
|
|
| |
llvm-svn: 365349
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary of changes:
- simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset;
- provided information about error position for pre-gfx9 targets;
- improved errors handling.
Reviewers: artem.tamazov, arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D64244
llvm-svn: 365321
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64201
llvm-svn: 365294
|
|
|
|
| |
llvm-svn: 365245
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a string attribute instead of directly setting
MachineFunctionInfo. This avoids trying to get the analysis in the
MachineFunctionInfo in a way that doesn't work with the new pass
manager.
This will also avoid re-visiting the call graph for every single
function.
llvm-svn: 365241
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Explicitly specify the parent MBB to allow the end iterator to be
used.
Reviewers: aprantl, MatzeB, craig.topper, qcolombet
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64261
llvm-svn: 365240
|
|
|
|
| |
llvm-svn: 365223
|
|
|
|
|
|
|
|
| |
Patch by Christudasan Devadasan.
Differential Revision: https://reviews.llvm.org/D63886
llvm-svn: 365217
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This allows the DPP combiner to kick in more often. For example the
exclusive scan generated by the atomic optimizer for a divergent atomic
add used to look like this:
v_mov_b32_e32 v3, v1
v_mov_b32_e32 v5, v1
v_mov_b32_e32 v6, v1
v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
s_nop 1
v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v6, v3 row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v3, v4, v5, v6
v_mov_b32_e32 v4, v1
s_nop 1
v_mov_b32_dpp v4, v3 row_shr:4 row_mask:0xf bank_mask:0xe
v_add_u32_e32 v3, v3, v4
v_mov_b32_e32 v4, v1
s_nop 1
v_mov_b32_dpp v4, v3 row_shr:8 row_mask:0xf bank_mask:0xc
v_add_u32_e32 v3, v3, v4
v_mov_b32_e32 v4, v1
s_nop 1
v_mov_b32_dpp v4, v3 row_bcast:15 row_mask:0xa bank_mask:0xf
v_add_u32_e32 v3, v3, v4
s_nop 1
v_mov_b32_dpp v1, v3 row_bcast:31 row_mask:0xc bank_mask:0xf
v_add_u32_e32 v1, v3, v1
v_add_u32_e32 v1, v2, v1
v_readlane_b32 s0, v1, 63
But now most of the dpp movs are combined into adds:
v_mov_b32_e32 v3, v1
v_mov_b32_e32 v5, v1
s_nop 0
v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
s_nop 1
v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v1, v4, v5, v1
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
v_add_u32_e32 v1, v2, v1
v_readlane_b32 s0, v1, 63
Reviewers: arsenm, vpykhtin
Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64207
llvm-svn: 365211
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these
sizes has been marked "expand", which made LegalizeDAG lower it to loads
and stores via a stack slot. The code got optimized a bit later, but the
now-unused stack slot was never deleted.
This commit avoids that problem by custom lowering INSERT_SUBVECTOR into
an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the
subvector to insert.
V2: Addressed review comments re test.
Differential Revision: https://reviews.llvm.org/D63160
Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1
llvm-svn: 365148
|
|
|
|
| |
llvm-svn: 365146
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: - That flag setting should skip spilling stack slot.
Reviewers: arsenm, rampitec
Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64143
llvm-svn: 365137
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is split out from my patches to split register allocation into a
separate SGPR and VGPR phase, and has some parts that aren't yet used
(like maintaining LiveIntervals).
This simplifies making the frame pointer register callee saved. As it
is now, the code to determine callee saves needs to predict all the
possible SGPR spills and how many callee saved VGPRs are needed. By
handling this before PrologEpilogInserter, it's possible to just check
the spill objects that already exist.
Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04
llvm-svn: 365095
|