summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Created a sub-register class for the return address operand in the ↵Christudasan Devadasan2019-07-0911-102/+104
| | | | | | | | | | | | | | return instruction. Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class exclusive of the CSRs, and used this regclass while lowering the return instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D63924 llvm-svn: 365512
* AMDGPU/GlobalISel: Prepare some tests for store selectionMatt Arsenault2019-07-0917-124/+92
| | | | | | | | | | | Mostsly these would fail due to trying to use SI with a flat operation. Implementing global loads with MUBUF is more work than flat, so these won't be handled in the initial load selection. Others fail because store of s64 won't initially work, as the current set of patterns expect everything to be turned into v2i32. llvm-svn: 365493
* AMDGPU/GlobalISel: Fix testMatt Arsenault2019-07-091-4/+1
| | | | llvm-svn: 365491
* AMDGPU/GlobalISel: Legalize more concat_vectorsMatt Arsenault2019-07-092-14/+99
| | | | llvm-svn: 365488
* AMDGPU/GlobalISel: Improve regbankselect for icmp s16Matt Arsenault2019-07-092-25/+366
| | | | | | Account for 64-bit scalar eq/ne when available. llvm-svn: 365487
* AMDGPU/GlobalISel: Make s16 G_ICMP legalMatt Arsenault2019-07-091-151/+474
| | | | llvm-svn: 365486
* AMDGPU/GlobalISel: Select G_SUBMatt Arsenault2019-07-091-0/+61
| | | | llvm-svn: 365484
* AMDGPU/GlobalISel: Select G_UNMERGE_VALUESMatt Arsenault2019-07-091-0/+231
| | | | llvm-svn: 365483
* AMDGPU/GlobalISel: Select G_MERGE_VALUESMatt Arsenault2019-07-092-0/+1303
| | | | llvm-svn: 365482
* [AMDGPU] Always use s_memtime for readcyclecounterStanislav Mekhanoshin2019-07-091-14/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D64369 llvm-svn: 365431
* AMDGPU: Make s34 the FP registerMatt Arsenault2019-07-0822-701/+764
| | | | | | | | | | | | | | | | | | | | | | | Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372
* RegUsageInfoCollector: Don't iterate all regs for every reg classMatt Arsenault2019-07-081-0/+46
| | | | | | | | | | This is extremly slow on AMDGPU, which has a lot of physical register and a lot of register classes. determineCalleeSaves, via MachineRegisterInfo::isPhysRegUsed already added all of the super registers to the saved set. llvm-svn: 365370
* Add, and infer, a nofree function attributeBrian Homerding2019-07-081-3/+3
| | | | | | | | | | | | This patch adds a function attribute, nofree, to indicate that a function does not, directly or indirectly, call a memory-deallocation function (e.g., free, C++'s operator delete). Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D49165 llvm-svn: 365336
* [AMDGPU] Added a new metadata for multi grid sync implicit argumentYaxun Liu2019-07-054-6/+1094
| | | | | | | | Patch by Christudasan Devadasan. Differential Revision: https://reviews.llvm.org/D63886 llvm-svn: 365217
* ScheduleDAG: Fix incorrectly killing registers in bundlesMatt Arsenault2019-07-051-0/+42
| | | | | | | | | | When looking for uses/defs to add kill flags, the iterator was double incremented, skipping the first instruction in the bundle. The use register in the first bundle instruction was then incorrectly killed. The "First" instruction should be the BUNDLE itself as the proper reverse iterator endpoint. llvm-svn: 365216
* [AMDGPU] DPP combiner: recognize identities for more opcodesJay Foad2019-07-051-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This allows the DPP combiner to kick in more often. For example the exclusive scan generated by the atomic optimizer for a divergent atomic add used to look like this: v_mov_b32_e32 v3, v1 v_mov_b32_e32 v5, v1 v_mov_b32_e32 v6, v1 v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf s_nop 1 v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v6, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v3, v4, v5, v6 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_shr:4 row_mask:0xf bank_mask:0xe v_add_u32_e32 v3, v3, v4 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_shr:8 row_mask:0xf bank_mask:0xc v_add_u32_e32 v3, v3, v4 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_bcast:15 row_mask:0xa bank_mask:0xf v_add_u32_e32 v3, v3, v4 s_nop 1 v_mov_b32_dpp v1, v3 row_bcast:31 row_mask:0xc bank_mask:0xf v_add_u32_e32 v1, v3, v1 v_add_u32_e32 v1, v2, v1 v_readlane_b32 s0, v1, 63 But now most of the dpp movs are combined into adds: v_mov_b32_e32 v3, v1 v_mov_b32_e32 v5, v1 s_nop 0 v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf s_nop 1 v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf v_add_u32_e32 v1, v2, v1 v_readlane_b32 s0, v1, 63 Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64207 llvm-svn: 365211
* [AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8Tim Renouf2019-07-041-0/+32
| | | | | | | | | | | | | | | | | | | Summary: Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these sizes has been marked "expand", which made LegalizeDAG lower it to loads and stores via a stack slot. The code got optimized a bit later, but the now-unused stack slot was never deleted. This commit avoids that problem by custom lowering INSERT_SUBVECTOR into an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the subvector to insert. V2: Addressed review comments re test. Differential Revision: https://reviews.llvm.org/D63160 Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1 llvm-svn: 365148
* [AMDGPU] Correct the setting of `FlatScratchInit`.Michael Liao2019-07-041-2/+4
| | | | | | | | | | | | | | Summary: - That flag setting should skip spilling stack slot. Reviewers: arsenm, rampitec Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64143 llvm-svn: 365137
* GlobalISel: Fix widenScalar for pointer typed G_MERGE_VALUESMatt Arsenault2019-07-031-0/+83
| | | | llvm-svn: 365093
* Revert "[AMDGPU] Kernel arg metadata: added support for "__hip_texture" type."Matt Arsenault2019-07-032-24/+0
| | | | | | | | This reverts commit r365073. This is crashing, and is improperly relying on IR type names. llvm-svn: 365087
* [AMDGPU] Kernel arg metadata: added support for "__hip_texture" type.Konstantin Pyzhov2019-07-032-0/+24
| | | | | | | | | Summary: Hip texture type is equivalent to OpenCL image. So, we need to set the Image type for kernel arguments with __hip_texture type. Differential revision: https://reviews.llvm.org/D63850 llvm-svn: 365073
* [SelectionDAG] Propagate alias metadata to target intrinsic nodesJames Molloy2019-07-031-1/+6
| | | | | | | | When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available. Differential revision: https://reviews.llvm.org/D64131 llvm-svn: 365043
* [AMDGPU] Enable serializing of argument info.Michael Liao2019-07-031-0/+1
| | | | | | | | | | | | | | | | Summary: - Support serialization of all arguments in machine function info. This enables fabricating MIR tests depending on argument info. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64096 llvm-svn: 364995
* [AArch64][GlobalISel] Overhaul legalization & isel or shifts to select ↵Amara Emerson2019-07-031-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | immediate forms. There are two main issues preventing us from generating immediate form shifts: 1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift immediate forms, but they currently don't work because the amount type is expected to be an s64 constant, but we only legalize them to have homogenous types. To deal with this, first we introduce a custom legalizer to *only* custom legalize s32 shifts which have a constant operand into a s64. There is also an additional artifact combiner to fold zexts(g_constant) to a larger G_CONSTANT if it's legal, a counterpart to the anyext version committed in an earlier patch. 2) For G_SHL the importer can't cope with the pattern. For this I introduced an early selection phase in the arm64 selector to select these forms manually before the tablegen selector pessimizes it to a register-register variant. Differential Revision: https://reviews.llvm.org/D63910 llvm-svn: 364994
* CodeGen: Set hasSideEffects = 0 on BUNDLEMatt Arsenault2019-07-035-44/+52
| | | | | | | | | | | | | The BUNDLE itself should not have side effects, and this is a property of instructions inside the bundle. The hasProperty check already searches for any member instructions, which was pointless since it was overridden by this bit. Allows me to distinguish bundles that have side effects vs. do not in a future patch. Also fixes an unnecessary scheduling barrier in the bundle AMDGPU uses to get PC relative addresses. llvm-svn: 364984
* AMDGPU: Look through bundles for existing waitcntsMatt Arsenault2019-07-031-0/+166
| | | | | | These aren't produced now, but will be in a future patch. llvm-svn: 364983
* AMDGPU: Custom lower vector_shuffle for v4i16/v4f16Matt Arsenault2019-07-021-189/+83
| | | | | | | | | Ordinarily it is lowered as a build_vector of each extract_vector_elt, which in turn get lowered to bitcasts and bit shifts. Very little understand the lowered extract pattern, resulting in much worse code. We treat concat_vectors of v2i16 as legal, so prefer that. llvm-svn: 364959
* [AMDGPU] LCSSA pass added in preISel. Uniform values defined in the ↵Alexander Timofeev2019-07-021-22/+25
| | | | | | | | | divergent loop and used outside Differential Revision: https://reviews.llvm.org/D63953 Reviewers: rampitec, nhaehnle, arsenm llvm-svn: 364950
* AMDGPU: Fix broken testMatt Arsenault2019-07-021-2/+2
| | | | llvm-svn: 364935
* AMDGPU/GlobalISel: Try generated matcher with intrinsicsMatt Arsenault2019-07-022-0/+93
| | | | llvm-svn: 364933
* AMDGPU/GlobalISel: Select mulMatt Arsenault2019-07-021-0/+78
| | | | llvm-svn: 364932
* GlobalISel: Define GINodeEquiv for G_UMULH/G_SMULHMatt Arsenault2019-07-022-0/+170
| | | | llvm-svn: 364931
* AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operandsMatt Arsenault2019-07-022-13/+13
| | | | | | | | The register bank for the destination of the sample argument copy was wrong. We shouldn't be constraining each source to the result register bank. Allow constraining the original register to the right size. llvm-svn: 364928
* AMDGPU/GlobalISel: Select G_FENCEMatt Arsenault2019-07-021-0/+719
| | | | | | | Manually select to workaround tablegen emitter emitting checks for G_CONSTANT. llvm-svn: 364927
* GlobalISel: Add G_FENCEMatt Arsenault2019-07-021-0/+361
| | | | | | | The pattern importer is for some reason emitting checks for G_CONSTANT for the immediate operands. llvm-svn: 364926
* AMDGPU: Correct properties for adjcallstack* pseudosMatt Arsenault2019-07-013-13/+13
| | | | | | | These should be SALU writes, and these are lowered to instructions that def SCC. llvm-svn: 364859
* GlobalISel: Try to widen merges with other mergesMatt Arsenault2019-07-011-18/+327
| | | | | | | | If the requested source type an be used as a merge source type, create a merge of merges. This avoids creating large, illegal extensions and bit-ops directly to the result type. llvm-svn: 364841
* AMDGPU: Revert accidental change to testMatt Arsenault2019-07-011-1/+1
| | | | llvm-svn: 364839
* AMDGPU/GlobalISel: Handle more input argument intrinsicsMatt Arsenault2019-07-017-11/+83
| | | | llvm-svn: 364836
* AMDGPU/GlobalISel: Lower kernarg segment ptr intrinsicsMatt Arsenault2019-07-012-19/+125
| | | | llvm-svn: 364835
* AMDGPU/GlobalISel: Legalize workgroup ID intrinsicsMatt Arsenault2019-07-012-0/+221
| | | | llvm-svn: 364834
* AMDGPU/GlobalISel: Legalize workitem ID intrinsicsMatt Arsenault2019-07-013-2/+95
| | | | | | | | | Tests don't cover the masked input path since non-kernel arguments aren't lowered yet. Test is copied directly from the existing test, with 2 additions. llvm-svn: 364833
* AMDGPU/GlobalISel: Custom lower control flow intrinsicsMatt Arsenault2019-07-012-11/+188
| | | | | | | | Replace the brcond for the 2 cases that act as branches. For now follow how the current system works, although I think we can eventually get rid of the pseudos. llvm-svn: 364832
* AMDGPU/GlobalISel: Handle 16-bit SALU min/maxMatt Arsenault2019-07-014-60/+372
| | | | | | | | | This needs to be extended to s32, and expanded into cmp+select. This is relying on the fact that widenScalar happens to leave the instruction in place, but this isn't a guaranteed property of LegalizerHelper. llvm-svn: 364831
* AMDGPU/GlobalISel: Lower SALU min/max to cmp+selectMatt Arsenault2019-07-014-80/+256
| | | | | | | Use a change observer to apply a register bank to the newly created intermediate result register. llvm-svn: 364830
* AMDGPU/GlobalISel: Add tests for add legalizationMatt Arsenault2019-07-011-0/+87
| | | | llvm-svn: 364828
* AMDGPU/GlobalISel: Legalize s16 add/sub/mulMatt Arsenault2019-07-013-58/+519
| | | | | | | If this is scalar, promote to s32. Use a new observer class to assign the register bank of newly created registers. llvm-svn: 364827
* AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECTMatt Arsenault2019-07-012-1068/+2232
| | | | | | | | | The condition register bank must be scc or vcc so that a copy will be inserted, which will be lowered to a compare. Currently greedy unnecessarily forces using a VCC select. llvm-svn: 364825
* AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghaltMatt Arsenault2019-07-012-0/+64
| | | | llvm-svn: 364819
* AMDGPU/GlobalISel: Legalize s16 fcmpMatt Arsenault2019-07-011-69/+252
| | | | llvm-svn: 364817
OpenPOWER on IntegriCloud