summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "[AMDGPU] Kernel arg metadata: added support for "__hip_texture" type."Matt Arsenault2019-07-031-10/+0
| | | | | | | | This reverts commit r365073. This is crashing, and is improperly relying on IR type names. llvm-svn: 365087
* [AMDGPU] Kernel arg metadata: added support for "__hip_texture" type.Konstantin Pyzhov2019-07-031-0/+10
| | | | | | | | | Summary: Hip texture type is equivalent to OpenCL image. So, we need to set the Image type for kernel arguments with __hip_texture type. Differential revision: https://reviews.llvm.org/D63850 llvm-svn: 365073
* [AMDGPU] Enable serializing of argument info.Michael Liao2019-07-033-1/+253
| | | | | | | | | | | | | | | | Summary: - Support serialization of all arguments in machine function info. This enables fabricating MIR tests depending on argument info. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64096 llvm-svn: 364995
* AMDGPU: Look through bundles for existing waitcntsMatt Arsenault2019-07-031-1/+2
| | | | | | These aren't produced now, but will be in a future patch. llvm-svn: 364983
* AMDGPU: Custom lower vector_shuffle for v4i16/v4f16Matt Arsenault2019-07-022-0/+63
| | | | | | | | | Ordinarily it is lowered as a build_vector of each extract_vector_elt, which in turn get lowered to bitcasts and bit shifts. Very little understand the lowered extract pattern, resulting in much worse code. We treat concat_vectors of v2i16 as legal, so prefer that. llvm-svn: 364959
* [AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commitAlexander Timofeev2019-07-021-1/+1
| | | | llvm-svn: 364952
* [AMDGPU] LCSSA pass added in preISel. Uniform values defined in the ↵Alexander Timofeev2019-07-022-0/+19
| | | | | | | | | divergent loop and used outside Differential Revision: https://reviews.llvm.org/D63953 Reviewers: rampitec, nhaehnle, arsenm llvm-svn: 364950
* AMDGPU/GlobalISel: Try generated matcher with intrinsicsMatt Arsenault2019-07-021-8/+7
| | | | llvm-svn: 364933
* AMDGPU/GlobalISel: Select mulMatt Arsenault2019-07-021-1/+1
| | | | llvm-svn: 364932
* AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operandsMatt Arsenault2019-07-021-4/+6
| | | | | | | | The register bank for the destination of the sample argument copy was wrong. We shouldn't be constraining each source to the result register bank. Allow constraining the original register to the right size. llvm-svn: 364928
* AMDGPU/GlobalISel: Select G_FENCEMatt Arsenault2019-07-021-0/+5
| | | | | | | Manually select to workaround tablegen emitter emitting checks for G_CONSTANT. llvm-svn: 364927
* AMDGPU: Correct properties for adjcallstack* pseudosMatt Arsenault2019-07-011-0/+4
| | | | | | | These should be SALU writes, and these are lowered to instructions that def SCC. llvm-svn: 364859
* AMDGPU/GlobalISel: Handle more input argument intrinsicsMatt Arsenault2019-07-012-41/+72
| | | | llvm-svn: 364836
* AMDGPU/GlobalISel: Lower kernarg segment ptr intrinsicsMatt Arsenault2019-07-013-24/+48
| | | | llvm-svn: 364835
* AMDGPU/GlobalISel: Legalize workgroup ID intrinsicsMatt Arsenault2019-07-012-0/+36
| | | | llvm-svn: 364834
* AMDGPU/GlobalISel: Legalize workitem ID intrinsicsMatt Arsenault2019-07-013-0/+127
| | | | | | | | | Tests don't cover the masked input path since non-kernel arguments aren't lowered yet. Test is copied directly from the existing test, with 2 additions. llvm-svn: 364833
* AMDGPU/GlobalISel: Custom lower control flow intrinsicsMatt Arsenault2019-07-012-0/+68
| | | | | | | | Replace the brcond for the 2 cases that act as branches. For now follow how the current system works, although I think we can eventually get rid of the pseudos. llvm-svn: 364832
* AMDGPU/GlobalISel: Handle 16-bit SALU min/maxMatt Arsenault2019-07-011-5/+19
| | | | | | | | | This needs to be extended to s32, and expanded into cmp+select. This is relying on the fact that widenScalar happens to leave the instruction in place, but this isn't a guaranteed property of LegalizerHelper. llvm-svn: 364831
* AMDGPU/GlobalISel: Lower SALU min/max to cmp+selectMatt Arsenault2019-07-011-6/+41
| | | | | | | Use a change observer to apply a register bank to the newly created intermediate result register. llvm-svn: 364830
* AMDGPU/GlobalISel: Legalize s16 add/sub/mulMatt Arsenault2019-07-012-2/+85
| | | | | | | If this is scalar, promote to s32. Use a new observer class to assign the register bank of newly created registers. llvm-svn: 364827
* AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECTMatt Arsenault2019-07-011-9/+20
| | | | | | | | | The condition register bank must be scc or vcc so that a copy will be inserted, which will be lowered to a compare. Currently greedy unnecessarily forces using a VCC select. llvm-svn: 364825
* AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghaltMatt Arsenault2019-07-011-3/+29
| | | | llvm-svn: 364819
* AMDGPU/GlobalISel: Legalize s16 fcmpMatt Arsenault2019-07-011-1/+9
| | | | llvm-svn: 364817
* AMDGPU/GFX10: implement ds_ordered_count changesNicolai Haehnle2019-07-011-1/+22
| | | | | | | | | | | | | | | | | | | Summary: ds_ordered_count can now simultaneously operate on up to 4 dwords in a single instruction, which are taken from (and returned to) lanes 0..3 of a single VGPR. Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276 Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63716 llvm-svn: 364815
* AMDGPU: Support GDS atomicsNicolai Haehnle2019-07-018-54/+97
| | | | | | | | | | | | | | | | | Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814
* AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swapMatt Arsenault2019-07-011-2/+31
| | | | llvm-svn: 364811
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelaneMatt Arsenault2019-07-011-5/+58
| | | | llvm-svn: 364808
* AMDGPU/GlobalISel: Fail instead of assert when selecting loadsMatt Arsenault2019-07-011-5/+11
| | | | llvm-svn: 364807
* AMDGPU/GlobalISel: Complete implementation of G_GEPMatt Arsenault2019-07-013-53/+79
| | | | | | | | Also works around tablegen defect in selecting add with unused carry, but if we have to manually select GEP, might as well handle add manually. llvm-svn: 364806
* AMDGPU/GlobalISel: Select G_PHIMatt Arsenault2019-07-012-0/+41
| | | | llvm-svn: 364805
* AMDGPU/GlobalISel: Try to select VOP3 form of addMatt Arsenault2019-07-011-0/+20
| | | | | | | | | | | There are several things broken, but at least emit the right thing for gfx9. The import of the pattern with the unused carry out seems to not work. Needs a special class for clamp, because OperandWithDefaultOps doesn't really work. llvm-svn: 364804
* AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlaneMatt Arsenault2019-07-012-0/+82
| | | | llvm-svn: 364801
* AMDGPU/GlobalISel: Implement select for 32-bit G_ADDTom Stellard2019-07-012-2/+7
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58804 llvm-svn: 364797
* AMDGPU/GlobalISel: Select G_BRCOND for vccMatt Arsenault2019-07-012-25/+44
| | | | llvm-svn: 364795
* AMDGPU/GlobalISel: Select G_FRAME_INDEXMatt Arsenault2019-07-012-0/+19
| | | | llvm-svn: 364789
* AMDGPU/GFX10: fix scratch resource descriptorNicolai Haehnle2019-07-011-2/+2
| | | | | | | | | | | | | | | | | | | | Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 llvm-svn: 364788
* AMDGPU/GlobalISel: Make s16 select legalMatt Arsenault2019-07-012-7/+9
| | | | | | | This is easy to handle and avoids legalization artifacts which are likely to obscure combines. llvm-svn: 364787
* AMDGPU/GlobalISel: Select G_BRCOND for scc conditionsMatt Arsenault2019-07-012-0/+34
| | | | llvm-svn: 364786
* AMDGPU/GlobalISel: Tolerate copies with no type setMatt Arsenault2019-07-011-3/+6
| | | | | | | isVCC has the same bug, but isn't used in a context where it can cause a problem. llvm-svn: 364784
* AMDGPU/GlobalISel: Select src modifiersMatt Arsenault2019-07-012-6/+43
| | | | llvm-svn: 364782
* AMDGPU: Convert some places to RegisterMatt Arsenault2019-07-012-9/+10
| | | | llvm-svn: 364769
* AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZEMatt Arsenault2019-07-011-0/+1
| | | | llvm-svn: 364768
* AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTORMatt Arsenault2019-07-011-1/+2
| | | | llvm-svn: 364767
* AMDGPU/GlobalISel: Fail on store to 32-bit address spaceMatt Arsenault2019-07-011-0/+6
| | | | llvm-svn: 364766
* AMDGPU/GlobalISel: Improve icmp selection coverage.Matt Arsenault2019-07-012-13/+38
| | | | | | Select s64 eq/ne scalar icmp. llvm-svn: 364765
* AMDGPU/GlobalISel: RegBankSelect for WWM/WQMMatt Arsenault2019-07-011-0/+2
| | | | llvm-svn: 364763
* AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.voteMatt Arsenault2019-07-011-1/+1
| | | | llvm-svn: 364762
* AMDGPU/GlobalISel: Fix scc->vcc copy handlingMatt Arsenault2019-07-012-13/+23
| | | | | | | | | | | | | This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761
* AMDGPU/GlobalISel: Use and instead of BFE with inline immediateMatt Arsenault2019-07-011-6/+29
| | | | | | | Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760
* [AMDGPU] Call isLoopExiting for blocks in the loop.Florian Hahn2019-07-011-2/+4
| | | | | | | | | | | | | | | | isLoopExiting should only be called for blocks in the loop. A follow up patch makes this requirement an assertion. I've updated the usage here, to only match for actual exit blocks. Previously, it would also match blocks not in the loop. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D63980 llvm-svn: 364750
OpenPOWER on IntegriCloud