summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GFX10: implement ds_ordered_count changesNicolai Haehnle2019-07-011-0/+23
| | | | | | | | | | | | | | | | | | | Summary: ds_ordered_count can now simultaneously operate on up to 4 dwords in a single instruction, which are taken from (and returned to) lanes 0..3 of a single VGPR. Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276 Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63716 llvm-svn: 364815
* AMDGPU: Support GDS atomicsNicolai Haehnle2019-07-011-0/+128
| | | | | | | | | | | | | | | | | Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814
* AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swapMatt Arsenault2019-07-012-0/+142
| | | | llvm-svn: 364811
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelaneMatt Arsenault2019-07-011-0/+98
| | | | llvm-svn: 364808
* AMDGPU/GlobalISel: Complete implementation of G_GEPMatt Arsenault2019-07-013-30/+384
| | | | | | | | Also works around tablegen defect in selecting add with unused carry, but if we have to manually select GEP, might as well handle add manually. llvm-svn: 364806
* AMDGPU/GlobalISel: Select G_PHIMatt Arsenault2019-07-012-0/+416
| | | | llvm-svn: 364805
* AMDGPU/GlobalISel: Try to select VOP3 form of addMatt Arsenault2019-07-011-13/+26
| | | | | | | | | | | There are several things broken, but at least emit the right thing for gfx9. The import of the pattern with the unused carry out seems to not work. Needs a special class for clamp, because OperandWithDefaultOps doesn't really work. llvm-svn: 364804
* AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlaneMatt Arsenault2019-07-012-0/+103
| | | | llvm-svn: 364801
* AMDGPU/GlobalISel: Implement select for 32-bit G_ADDTom Stellard2019-07-011-0/+43
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58804 llvm-svn: 364797
* AMDGPU/GlobalISel: Select G_BRCOND for vccMatt Arsenault2019-07-011-11/+36
| | | | llvm-svn: 364795
* AMDGPU/GlobalISel: Select G_FRAME_INDEXMatt Arsenault2019-07-011-0/+38
| | | | llvm-svn: 364789
* AMDGPU/GFX10: fix scratch resource descriptorNicolai Haehnle2019-07-011-28/+37
| | | | | | | | | | | | | | | | | | | | Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 llvm-svn: 364788
* AMDGPU/GlobalISel: Make s16 select legalMatt Arsenault2019-07-014-70/+244
| | | | | | | This is easy to handle and avoids legalization artifacts which are likely to obscure combines. llvm-svn: 364787
* AMDGPU/GlobalISel: Select G_BRCOND for scc conditionsMatt Arsenault2019-07-012-0/+194
| | | | llvm-svn: 364786
* AMDGPU/GlobalISel: Tolerate copies with no type setMatt Arsenault2019-07-011-0/+56
| | | | | | | isVCC has the same bug, but isn't used in a context where it can cause a problem. llvm-svn: 364784
* AMDGPU: Fix tests using the default alloca address spaceMatt Arsenault2019-07-012-14/+16
| | | | llvm-svn: 364783
* AMDGPU/GlobalISel: Select src modifiersMatt Arsenault2019-07-012-30/+191
| | | | llvm-svn: 364782
* AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZEMatt Arsenault2019-07-011-0/+35
| | | | llvm-svn: 364768
* AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTORMatt Arsenault2019-07-011-0/+69
| | | | llvm-svn: 364767
* AMDGPU/GlobalISel: Fail on store to 32-bit address spaceMatt Arsenault2019-07-011-3/+3
| | | | llvm-svn: 364766
* AMDGPU/GlobalISel: Improve icmp selection coverage.Matt Arsenault2019-07-011-0/+595
| | | | | | Select s64 eq/ne scalar icmp. llvm-svn: 364765
* AMDGPU/GlobalISel: RegBankSelect for WWM/WQMMatt Arsenault2019-07-012-0/+62
| | | | llvm-svn: 364763
* AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.voteMatt Arsenault2019-07-011-5/+5
| | | | llvm-svn: 364762
* AMDGPU/GlobalISel: Fix scc->vcc copy handlingMatt Arsenault2019-07-011-26/+88
| | | | | | | | | | | | | This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761
* AMDGPU/GlobalISel: Use and instead of BFE with inline immediateMatt Arsenault2019-07-013-4/+119
| | | | | | | Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760
* GlobalISel: Add GINodeEquiv for min/maxMatt Arsenault2019-07-014-0/+332
| | | | llvm-svn: 364759
* GlobalISel: Add DAG compat for G_FCANONICALIZEMatt Arsenault2019-07-011-0/+169
| | | | llvm-svn: 364758
* AMDGPU/GlobalISel: Add some more tests for icmp selectMatt Arsenault2019-06-291-32/+80
| | | | llvm-svn: 364703
* AMDGPU/GlobalISel: RegBankSelect for update.dppMatt Arsenault2019-06-291-0/+82
| | | | llvm-svn: 364701
* AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.decMatt Arsenault2019-06-292-0/+160
| | | | llvm-svn: 364699
* AMDGPU/GlobalISel: RegBankSelect for some DS intrinsicsMatt Arsenault2019-06-296-0/+286
| | | | llvm-svn: 364698
* AMDGPU/GlobalISel: RegBankSelect for icmp/fcmp intrinsicsMatt Arsenault2019-06-292-0/+134
| | | | llvm-svn: 364696
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.fmasMatt Arsenault2019-06-291-0/+106
| | | | llvm-svn: 364695
* AMDGPU/GlobalISel: RegBankSelect for some simple leaf intrinsicsMatt Arsenault2019-06-296-0/+84
| | | | llvm-svn: 364694
* AMDGPU: Add baseline test for packed shufflevectorMatt Arsenault2019-06-281-0/+928
| | | | llvm-svn: 364691
* [AMDGPU][MC] Enabled constant expressions as operands of sendmsgDmitry Preobrazhensky2019-06-281-1/+1
| | | | | | | | | | See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62735 llvm-svn: 364645
* [AMDGPU] Packed thread ids in function call ABIStanislav Mekhanoshin2019-06-283-53/+152
| | | | | | Differential Revision: https://reviews.llvm.org/D63851 llvm-svn: 364619
* AMDGPU: Make fixing i1 copies robust against re-orderingNicolai Haehnle2019-06-271-0/+51
| | | | | | | | | | | | | | | | | Summary: The new test case led to incorrect code. Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca Reviewers: arsenm, david-salinas Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63871 llvm-svn: 364566
* [GlobalISel] Accept multiple vregs in lowerFormalArgsDiana Picus2019-06-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510
* [AMDGPU] Fix +DumpCode to print an entry label for the first functionJay Foad2019-06-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The +DumpCode attribute is a horrible hack in AMDGPU to embed the disassembly of the generated code into the elf file. It is used by LLPC to implement an extension that allows the application to read back the disassembly of the code. It tries to print an entry label at the start of every function, but that didn't work for the first function in the module because DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart which is too late. Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee Reviewers: arsenm, tpr, kzhuravl Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63712 llvm-svn: 364508
* [AMDGPU] Fix Livereg computation during epilogue insertionMatt Arsenault2019-06-261-0/+1
| | | | | | | | | | | | The LivePhysRegs calculated in order to find a scratch register in the epilogue code wrongly uses 'LiveIns'. Instead, it should use the 'Liveout' sets. For the liveness, also considering the operands of the terminator (return) instruction which is the insertion point for the scratch-exec-copy instruction. Patch by Christudasan Devadasan llvm-svn: 364470
* Update phis in AMDGPUUnifyDivergentExitNodesDiego Novillo2019-06-251-0/+39
| | | | | | | | | | | | | | | Original patch https://reviews.llvm.org/D63659 from Steven Perron <stevenperron@google.com> The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in the successors of blocks that is splits. This is fixed by calling BasicBlock::splitBasicBlock to split the block instead of doing it manually. This does extra work because a new conditional branch is created in BB which is immediately replaced, but I think the simplicity is worth it. It also helps make the code more future proof in case other things need to be updated. llvm-svn: 364342
* AMDGPU/GlobalISel: Fix broken testMatt Arsenault2019-06-251-3/+3
| | | | llvm-svn: 364316
* AMDGPU/GlobalISel: Fix duplicated testMatt Arsenault2019-06-252-187/+0
| | | | | | | Somehow ended up with copies of the same tests in AMDGPU and AMDGPU/GlobalISel llvm-svn: 364309
* AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXTMatt Arsenault2019-06-253-0/+545
| | | | llvm-svn: 364308
* AMDGPU: Write LDS objects out as global symbols in code generationNicolai Haehnle2019-06-2519-87/+179
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297
* AMDGPU/GlobalISel: Fix regbankselect for amdgcn.classMatt Arsenault2019-06-251-14/+51
| | | | llvm-svn: 364262
* AMDGPU/GlobalISel: Add tests for regbankselect of v2s16 and/or/xorMatt Arsenault2019-06-243-0/+195
| | | | llvm-svn: 364244
* AMDGPU/GlobalISel: Select G_TRUNCMatt Arsenault2019-06-241-0/+373
| | | | llvm-svn: 364215
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.classMatt Arsenault2019-06-241-0/+31
| | | | llvm-svn: 364214
OpenPOWER on IntegriCloud