summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: s_waitcnt field should be treated as unsignedMatt Arsenault2019-07-112-3/+7
| | | | | | | Also make it an ImmLeaf, so it should work with global isel as well, which was part of the point of moving it in the first place. llvm-svn: 365842
* [AMDGPU] Fixed asan error with agpr spillingStanislav Mekhanoshin2019-07-111-1/+4
| | | | | | Instruction was used after it was erased. llvm-svn: 365837
* [AMDGPU] gfx908 agpr spillingStanislav Mekhanoshin2019-07-117-45/+367
| | | | | | Differential Revision: https://reviews.llvm.org/D64594 llvm-svn: 365833
* [AMDGPU] gfx908 hazard recognizerStanislav Mekhanoshin2019-07-112-1/+233
| | | | | | Differential Revision: https://reviews.llvm.org/D64593 llvm-svn: 365829
* [AMDGPU] gfx908 schedulingStanislav Mekhanoshin2019-07-113-0/+163
| | | | | | Differential Revision: https://reviews.llvm.org/D64590 llvm-svn: 365826
* [AMDGPU] gfx908 mfma supportStanislav Mekhanoshin2019-07-1116-62/+548
| | | | | | Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
* AMDGPU/GlobalISel: Move kernel argument handling to separate functionMatt Arsenault2019-07-112-42/+61
| | | | llvm-svn: 365782
* Remove some redundant code from r290372 and improve a comment.Jay Foad2019-07-111-5/+3
| | | | llvm-svn: 365741
* [AMDGPU] gfx908 atomic fadd and atomic pk_faddStanislav Mekhanoshin2019-07-117-4/+195
| | | | | | Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717
* [AMDGPU] gfx908 dot instruction supportStanislav Mekhanoshin2019-07-111-0/+30
| | | | | | Differential Revision: https://reviews.llvm.org/D64431 llvm-svn: 365715
* GlobalISel: Legalization for G_FMINNUM/G_FMAXNUMMatt Arsenault2019-07-102-1/+57
| | | | llvm-svn: 365658
* AMDGPU: Serialize mode from MachineFunctionInfoMatt Arsenault2019-07-103-1/+32
| | | | llvm-svn: 365653
* [AMDGPU] Allow abs/neg source modifiers on v_cndmask_b32Jay Foad2019-07-101-7/+8
| | | | | | | | | | | | | | | | | Summary: D59191 added support for these modifiers in the assembler and disassembler. This patch just teaches instruction selection that it can use them. Reviewers: arsenm, tstellar Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64497 llvm-svn: 365640
* AMDGPU/GlobalISel: Add support for wide loads >= 256-bitsTom Stellard2019-07-104-37/+219
| | | | | | | | | | | | | | | | | | Summary: This adds support for the most commonly used wide load types: <8xi32>, <16xi32>, <4xi64>, and <8xi64> Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57399 llvm-svn: 365586
* GlobalISel: Implement lower for G_FCOPYSIGNMatt Arsenault2019-07-091-3/+2
| | | | | | | | | In SelectionDAG AMDGPU treated these as legal, but this was mostly because the bitcasts required for FP types were painful. Theoretically the bitpattern should eventually match to bfi, so don't bother trying to get the patterns to import. llvm-svn: 365583
* AMDGPU/GlobalISel: Fix legality for G_BUILD_VECTORMatt Arsenault2019-07-091-7/+4
| | | | llvm-svn: 365575
* [AMDGPU] gfx908 v_pk_fmac_f16 supportStanislav Mekhanoshin2019-07-092-4/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D64433 llvm-svn: 365573
* [AMDGPU] gfx908 mAI instructions, MC partStanislav Mekhanoshin2019-07-0919-18/+674
| | | | | | Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563
* [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into ↵Craig Topper2019-07-092-5/+11
| | | | | | | | | | | | | | | | isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549
* [AMDGPU] gfx908 register file changesStanislav Mekhanoshin2019-07-096-50/+621
| | | | | | Differential Revision: https://reviews.llvm.org/D64438 llvm-svn: 365546
* [AMDGPU] gfx908 targetStanislav Mekhanoshin2019-07-095-0/+100
| | | | | | Differential Revision: https://reviews.llvm.org/D64429 llvm-svn: 365525
* [AMDGPU] Created a sub-register class for the return address operand in the ↵Christudasan Devadasan2019-07-093-12/+15
| | | | | | | | | | | | | | return instruction. Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class exclusive of the CSRs, and used this regclass while lowering the return instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D63924 llvm-svn: 365512
* AMDGPU/GlobalISel: Legalize more concat_vectorsMatt Arsenault2019-07-091-13/+16
| | | | llvm-svn: 365488
* AMDGPU/GlobalISel: Improve regbankselect for icmp s16Matt Arsenault2019-07-091-5/+10
| | | | | | Account for 64-bit scalar eq/ne when available. llvm-svn: 365487
* AMDGPU/GlobalISel: Make s16 G_ICMP legalMatt Arsenault2019-07-091-2/+8
| | | | llvm-svn: 365486
* AMDGPU/GlobalISel: Select G_SUBMatt Arsenault2019-07-092-7/+15
| | | | llvm-svn: 365484
* AMDGPU/GlobalISel: Select G_UNMERGE_VALUESMatt Arsenault2019-07-092-0/+47
| | | | llvm-svn: 365483
* AMDGPU/GlobalISel: Select G_MERGE_VALUESMatt Arsenault2019-07-094-18/+77
| | | | llvm-svn: 365482
* [AMDGPU] Added td definitions for HW regsStanislav Mekhanoshin2019-07-091-0/+25
| | | | | | | | Infrastructure work for future commit. NFC. Differential Revision: https://reviews.llvm.org/D64370 llvm-svn: 365432
* [AMDGPU] Always use s_memtime for readcyclecounterStanislav Mekhanoshin2019-07-091-11/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D64369 llvm-svn: 365431
* AMDGPU: Split extload/zextload local load patternsMatt Arsenault2019-07-084-32/+57
| | | | | | | This will help removing the custom load predicates, allowing the global isel emitter to handle them. llvm-svn: 365398
* Add parentheses to silence warning.Bill Wendling2019-07-081-6/+6
| | | | llvm-svn: 365394
* AMDGPU: Fix unused variable in release buildMatt Arsenault2019-07-081-3/+3
| | | | llvm-svn: 365378
* AMDGPU: Fix stray typingMatt Arsenault2019-07-081-1/+1
| | | | llvm-svn: 365373
* AMDGPU: Make s34 the FP registerMatt Arsenault2019-07-086-142/+436
| | | | | | | | | | | | | | | | | | | | | | | Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372
* AMDGPU: Move DEBUG_TYPE definition below includesMatt Arsenault2019-07-081-2/+2
| | | | llvm-svn: 365369
* AMDGPU: Remove mubuf specific PatFragsMatt Arsenault2019-07-081-27/+13
| | | | | | | These are identical to the *_global PatFrag, and will only create more work to get the GlobalISel importer to handle them. llvm-svn: 365350
* AMDGPU: Move waitcnt intrinsic to instruction definition patternMatt Arsenault2019-07-082-12/+3
| | | | llvm-svn: 365349
* [AMDGPU][MC] Corrected parsing of FLAT offset modifierDmitry Preobrazhensky2019-07-087-98/+114
| | | | | | | | | | | | | | Summary of changes: - simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset; - provided information about error position for pre-gfx9 targets; - improved errors handling. Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D64244 llvm-svn: 365321
* [AMDGPU] Use a named predicate instead of a magic number.Jay Foad2019-07-081-4/+3
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64201 llvm-svn: 365294
* AMDGPU: Fix assert in clang testMatt Arsenault2019-07-051-1/+2
| | | | llvm-svn: 365245
* AMDGPU: Make AMDGPUPerfHintAnalysis an SCC passMatt Arsenault2019-07-055-41/+53
| | | | | | | | | | | | Add a string attribute instead of directly setting MachineFunctionInfo. This avoids trying to get the analysis in the MachineFunctionInfo in a way that doesn't work with the new pass manager. This will also avoid re-visiting the call graph for every single function. llvm-svn: 365241
* [CodeGen] Enhance `MachineInstrSpan` to allow the end of MBB to be used.Michael Liao2019-07-051-1/+1
| | | | | | | | | | | | | | | | Summary: - Explicitly specify the parent MBB to allow the end iterator to be used. Reviewers: aprantl, MatzeB, craig.topper, qcolombet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64261 llvm-svn: 365240
* [NFC] A test commit to check the access permission. Removed a blank line.Christudasan Devadasan2019-07-051-1/+0
| | | | llvm-svn: 365223
* [AMDGPU] Added a new metadata for multi grid sync implicit argumentYaxun Liu2019-07-051-0/+8
| | | | | | | | Patch by Christudasan Devadasan. Differential Revision: https://reviews.llvm.org/D63886 llvm-svn: 365217
* [AMDGPU] DPP combiner: recognize identities for more opcodesJay Foad2019-07-051-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This allows the DPP combiner to kick in more often. For example the exclusive scan generated by the atomic optimizer for a divergent atomic add used to look like this: v_mov_b32_e32 v3, v1 v_mov_b32_e32 v5, v1 v_mov_b32_e32 v6, v1 v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf s_nop 1 v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v6, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v3, v4, v5, v6 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_shr:4 row_mask:0xf bank_mask:0xe v_add_u32_e32 v3, v3, v4 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_shr:8 row_mask:0xf bank_mask:0xc v_add_u32_e32 v3, v3, v4 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_bcast:15 row_mask:0xa bank_mask:0xf v_add_u32_e32 v3, v3, v4 s_nop 1 v_mov_b32_dpp v1, v3 row_bcast:31 row_mask:0xc bank_mask:0xf v_add_u32_e32 v1, v3, v1 v_add_u32_e32 v1, v2, v1 v_readlane_b32 s0, v1, 63 But now most of the dpp movs are combined into adds: v_mov_b32_e32 v3, v1 v_mov_b32_e32 v5, v1 s_nop 0 v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf s_nop 1 v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf v_add_u32_e32 v1, v2, v1 v_readlane_b32 s0, v1, 63 Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64207 llvm-svn: 365211
* [AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8Tim Renouf2019-07-042-8/+37
| | | | | | | | | | | | | | | | | | | Summary: Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these sizes has been marked "expand", which made LegalizeDAG lower it to loads and stores via a stack slot. The code got optimized a bit later, but the now-unused stack slot was never deleted. This commit avoids that problem by custom lowering INSERT_SUBVECTOR into an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the subvector to insert. V2: Addressed review comments re test. Differential Revision: https://reviews.llvm.org/D63160 Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1 llvm-svn: 365148
* Fix typos in comments and debug output.Jay Foad2019-07-041-3/+3
| | | | llvm-svn: 365146
* [AMDGPU] Correct the setting of `FlatScratchInit`.Michael Liao2019-07-041-1/+12
| | | | | | | | | | | | | | Summary: - That flag setting should skip spilling stack slot. Reviewers: arsenm, rampitec Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64143 llvm-svn: 365137
* AMDGPU: Add pass to lower SGPR spillsMatt Arsenault2019-07-039-33/+346
| | | | | | | | | | | | | | | This is split out from my patches to split register allocation into a separate SGPR and VGPR phase, and has some parts that aren't yet used (like maintaining LiveIntervals). This simplifies making the frame pointer register callee saved. As it is now, the code to determine callee saves needs to predict all the possible SGPR spills and how many callee saved VGPRs are needed. By handling this before PrologEpilogInserter, it's possible to just check the spill objects that already exist. Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04 llvm-svn: 365095
OpenPOWER on IntegriCloud