summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-061-14/+47
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367969
* Revert "[AMDGPU] Use S_DENORM_MODE for gfx10"Dmitri Gribenko2019-08-051-47/+14
| | | | | | | This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. llvm-svn: 367904
* [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-051-14/+47
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367882
* AMDGPU: Correct behavior of f16 buffer loadsMatt Arsenault2019-08-051-43/+56
| | | | | | | Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879
* AMDGPU: Correct behavior of f16/i16 non-format store intrinsicsMatt Arsenault2019-08-051-30/+59
| | | | | | | | | This was switching to use a format store for a non-format store for f16 types. Also fixes i16/f16 stores on targets without legal f16. The corresponding loads also need to be fixed. llvm-svn: 367872
* [LLVM][Alignment] Introduce Alignment TypeGuillaume Chatelet2019-08-051-6/+6
| | | | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828
* AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}Nicolai Haehnle2019-08-051-2/+18
| | | | | | | | | | | | | | | | | Summary: Wrapping increment/decrement. These aren't exposed by many APIs... Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c Reviewers: arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65283 llvm-svn: 367821
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-011-3/+3
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* AMDGPU: Remove v0 workaround for DS_GWS_* instructionsMatt Arsenault2019-08-011-15/+3
| | | | | | | Any register should work for the src field since r366067, since the used value is not pulled from the expected encoding field. llvm-svn: 367598
* AMDGPU: Use tablegen pattern for sendmsg intrinsicsMatt Arsenault2019-08-011-9/+0
| | | | | | | Since this now emits a direct copy to m0, SIFixSGPRCopies has to handle a physical register. llvm-svn: 367593
* [AMDGPU] Reserve all AGPRs on targets which do not have themStanislav Mekhanoshin2019-07-301-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D65471 llvm-svn: 367347
* [AMDGPU] Enable v4f16 and above for v_pk_fma instructionsDavid Stuttard2019-07-291-0/+27
| | | | | | | | | | | | | | | | | | | | Summary: If isel is presented with <2 x half> vectors then it will correctly select v_pk_fma style instructions. If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for other instruction types (such as fadd, fmul etc.) Added extra support to enable this. Updated one of the tests to include a test for this (as well as extending the test to GFX9) Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65325 Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee llvm-svn: 367206
* [AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAGCarl Ritson2019-07-261-10/+0
| | | | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65328 llvm-svn: 367105
* AMDGPU: Don't assert on v4f16 arguments to shader calling conventionsMatt Arsenault2019-07-251-1/+2
| | | | llvm-svn: 367018
* AMDGPU: Force s_waitcnt after GWS instructionsMatt Arsenault2019-07-191-3/+20
| | | | | | | This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607
* AMDGPU/GlobalISel: Rewrite lowerFormalArgumentsMatt Arsenault2019-07-191-29/+36
| | | | | | | | | | | | | | | | | This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582
* AMDGPU: Decompose all values to 32-bit pieces for calling conventionsMatt Arsenault2019-07-191-11/+18
| | | | | | | | | | This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578
* [AMDGPU] Change register type for v32 vectorsStanislav Mekhanoshin2019-07-161-2/+2
| | | | | | | | | | When it is AReg_1024 this results in unnecessary copying into AGPRs of a 32 element vectors even though they are not intended for an mfma instruction. Differential Revision: https://reviews.llvm.org/D64815 llvm-svn: 366252
* AMDGPU: Add 24-bit mul intrinsicsMatt Arsenault2019-07-151-0/+5
| | | | | | | | | | | Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it. llvm-svn: 366094
* [AMDGPU] use v32f32 for 3 mfma intrinsicsStanislav Mekhanoshin2019-07-121-2/+4
| | | | | | | | | These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972
* AMDGPU: Drop remnants of byval support for shadersMatt Arsenault2019-07-121-2/+1
| | | | | | | | Before 2018, mesa used to use byval interchangably with inreg, which didn't really make sense. Fix tests still using it to avoid breaking in a future commit. llvm-svn: 365953
* [AMDGPU] gfx908 mfma supportStanislav Mekhanoshin2019-07-111-0/+34
| | | | | | Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
* [AMDGPU] gfx908 atomic fadd and atomic pk_faddStanislav Mekhanoshin2019-07-111-0/+70
| | | | | | Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717
* [AMDGPU] gfx908 mAI instructions, MC partStanislav Mekhanoshin2019-07-091-0/+26
| | | | | | Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563
* [AMDGPU] Created a sub-register class for the return address operand in the ↵Christudasan Devadasan2019-07-091-9/+6
| | | | | | | | | | | | | | return instruction. Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class exclusive of the CSRs, and used this regclass while lowering the return instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D63924 llvm-svn: 365512
* AMDGPU: Make s34 the FP registerMatt Arsenault2019-07-081-17/+0
| | | | | | | | | | | | | | | | | | | | | | | Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372
* [AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8Tim Renouf2019-07-041-8/+36
| | | | | | | | | | | | | | | | | | | Summary: Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these sizes has been marked "expand", which made LegalizeDAG lower it to loads and stores via a stack slot. The code got optimized a bit later, but the now-unused stack slot was never deleted. This commit avoids that problem by custom lowering INSERT_SUBVECTOR into an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the subvector to insert. V2: Addressed review comments re test. Differential Revision: https://reviews.llvm.org/D63160 Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1 llvm-svn: 365148
* AMDGPU: Custom lower vector_shuffle for v4i16/v4f16Matt Arsenault2019-07-021-0/+62
| | | | | | | | | Ordinarily it is lowered as a build_vector of each extract_vector_elt, which in turn get lowered to bitcasts and bit shifts. Very little understand the lowered extract pattern, resulting in much worse code. We treat concat_vectors of v2i16 as legal, so prefer that. llvm-svn: 364959
* AMDGPU/GFX10: implement ds_ordered_count changesNicolai Haehnle2019-07-011-1/+22
| | | | | | | | | | | | | | | | | | | Summary: ds_ordered_count can now simultaneously operate on up to 4 dwords in a single instruction, which are taken from (and returned to) lanes 0..3 of a single VGPR. Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276 Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63716 llvm-svn: 364815
* AMDGPU: Support GDS atomicsNicolai Haehnle2019-07-011-3/+3
| | | | | | | | | | | | | | | | | Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814
* [AMDGPU] Packed thread ids in function call ABIStanislav Mekhanoshin2019-06-281-11/+86
| | | | | | Differential Revision: https://reviews.llvm.org/D63851 llvm-svn: 364619
* AMDGPU: Write LDS objects out as global symbols in code generationNicolai Haehnle2019-06-251-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297
* AMDGPU: Fix not using s33 for scratch wave offset in kernelsMatt Arsenault2019-06-211-7/+11
| | | | | | Fixes missing piece from r363990. llvm-svn: 364099
* AMDGPU: Always use s33 for global scratch wave offsetMatt Arsenault2019-06-201-8/+0
| | | | | | | | | Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990
* AMDGPU: Add intrinsics for DS GWS semaphore instructionsMatt Arsenault2019-06-201-4/+7
| | | | llvm-svn: 363983
* AMDGPU: Insert mem_viol check loop around GWS pre-GFX9Matt Arsenault2019-06-201-18/+114
| | | | | | | It is necessary to emit this loop around GWS operations in case the wave is preempted pre-GFX9. llvm-svn: 363979
* AMDGPU: Consolidate some getGeneration checksMatt Arsenault2019-06-191-9/+6
| | | | | | | | This is incomplete, and ideally these would all be removed, but it's better to localize them to the subtarget first with comments about what they're for. llvm-svn: 363902
* Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"Matt Arsenault2019-06-191-0/+18
| | | | | | | | This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870
* Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsSimon Pilgrim2019-06-191-18/+0
| | | | | | | | | There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797
* AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsMatt Arsenault2019-06-181-0/+18
| | | | | | | There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678
* AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0Nicolai Haehnle2019-06-161-5/+12
| | | | | | | | | | | | | | | | | | | | | Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516
* AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsicNicolai Haehnle2019-06-161-5/+11
| | | | | | | | | | | | | | | Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514
* [AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin2019-06-161-24/+78
| | | | | | | | | This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
* [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32Stanislav Mekhanoshin2019-06-131-5/+27
| | | | | | Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339
* [AMDGPU] gfx1010 premlane instructionsStanislav Mekhanoshin2019-06-121-0/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185
* [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests ↵Simon Pilgrim2019-06-121-4/+3
| | | | | | | | | | | | | | (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179
* [TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.Simon Pilgrim2019-06-111-5/+6
| | | | | | As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048
* [AMDGPU] Optimize image_[load|store]_mipPiotr Sobczak2019-06-101-0/+13
| | | | | | | | | | | | | | | | | | Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957
* [AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fddAlexander Timofeev2019-06-061-87/+0
| | | | | | | | | | | | "Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749
* AMDGPU: Don't fix emergency stack slot at offset 0Matt Arsenault2019-06-051-10/+0
| | | | | | | | | | | | | | | | | | | | | This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665
OpenPOWER on IntegriCloud