summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/GlobalISel
Commit message (Collapse)AuthorAgeFilesLines
...
* [MIPS GlobalISel] Select bitreversePetar Avramovic2019-12-301-6/+15
| | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Differential Revision: https://reviews.llvm.org/D71363
* AMDGPU/GlobalISel: Use SReg_32 for readfirstlane constrainingMatt Arsenault2019-12-278-15/+15
| | | | | This matches the DAG behavior where we don't use SReg_32_XM0 everywhere anymore, and fixes not coalescing the copies into m0.
* AMDGPU/GlobalISel: Fix extra result register in fdiv64 loweringMatt Arsenault2019-12-271-36/+36
| | | | | | There ended up being two result registers, which would fail on select. It was really defing a new temp register in the correct def position, instead of the correct result register.
* AMDGPU/GlobalISel: Select some 128-bit load/storesMatt Arsenault2019-12-274-48/+52
|
* AMDGPU/GlobalISel: Legalize some 16-bit round instructionsMatt Arsenault2019-12-246-16/+345
|
* GlobalISel: Define equivalent node for G_INTRINSIC_TRUNCMatt Arsenault2019-12-241-0/+82
|
* AMDGPU/GlobalISel: Lower llvm.amdgcn.elseMatt Arsenault2019-12-241-0/+31
|
* [AMDGPU] Don't create MachinePointerInfos with an UndefValue pointerJay Foad2019-12-232-80/+80
| | | | | | | | | | | | | | | Summary: The only useful information the UndefValue conveys is the address space, which MachinePointerInfo can represent directly without referring to an IR value. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71838
* AMDGPU/GlobalISel: Fix misuse of div_scale intrinsicsMatt Arsenault2019-12-211-152/+152
| | | | | | | | | | | Confusingly, the intrinsic operands do not match the instruction/custom node. The order is shuffled, and the 3rd operand is an immediate to select operands. I'm not 100% sure I did this right, but fdiv still doesn't select end to end and it will be easier to tell when it does. This at least avoids an assertion in RegBankSelect and allows hitting the fallback on selection.
* AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xorMatt Arsenault2019-12-213-56/+56
|
* [Legalizer] Making artifact combining order-independentRoman Tereshin2019-12-132-65/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Legalization algorithm is complicated by two facts: 1) While regular instructions should be possible to legalize in an isolated, per-instruction, context-free manner, legalization artifacts can only be eliminated in pairs, which could be deeply, and ultimately arbitrary nested: { [ () ] }, where which paranthesis kind depicts an artifact kind, like extend, unmerge, etc. Such structure can only be fully eliminated by simple local combines if they are attempted in a particular order (inside out), or alternatively by repeated scans each eliminating only one innermost pair, resulting in O(n^2) complexity. 2) Some artifacts might in fact be regular instructions that could (and sometimes should) be legalized by the target-specific rules. Which means failure to eliminate all artifacts on the first iteration is not a failure, they need to be tried as instructions, which may produce more artifacts, including the ones that are in fact regular instructions, resulting in a non-constant number of iterations required to finish the process. I trust the recently introduced termination condition (no new artifacts were created during as-a-regular-instruction-retrial of artifacts not eliminated on the previous iteration) to be efficient in providing termination, but only performing the legalization in full if and only if at each step such chains of artifacts are successfully eliminated in full as well. Which is currently not guaranteed, as the artifact combines are applied only once and in an arbitrary order that has to do with the order of creation or insertion of artifacts into their worklist, which is a no particular order. In this patch I make a small change to the artifact combiner, making it to re-insert into the worklist immediate (modulo a look-through copies) artifact users of each vreg that changes its definition due to an artifact combine. Here the first scan through the artifacts worklist, while not being done in any guaranteed order, only needs to find the innermost pair(s) of artifacts that could be immediately combined out. After that the process follows def-use chains, making them shorter at each step, thus combining everything that can be combined in O(n) time. Reviewers: volkan, aditya_nandakumar, qcolombet, paquette, aemerson, dsanders Reviewed By: aditya_nandakumar, paquette Tags: #llvm Differential Revision: https://reviews.llvm.org/D71448
* AMDGPU/GlobalISel: Add AGPR bank and RegBankSelect mfma intrinsicsAustin Kerbow2019-12-011-0/+943
| | | | Differential Revision: https://reviews.llvm.org/D70871
* AMDGPU/GlobalISel: Legalize FDIV64Austin Kerbow2019-11-191-41/+468
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70403
* AMDGPU: Refactor treatment of denormal modeMatt Arsenault2019-11-191-12/+12
| | | | | | | | | | | Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
* AMDGPU: Be explicit about denormal mode in MIR testsMatt Arsenault2019-11-196-863/+1946
| | | | | | | Start checking the machine function in GlobalISel instead of the target directly. This temporarily breaks fcanonicalize selection in GlobalISel.
* AMDGPU: Change boolean content type to 0 or 1Matt Arsenault2019-11-152-2/+2
| | | | | | | | The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to some degree is the choice of lower resistance since that's what most targets use. This enables a few combines that don't bother to support ZeroOrNegativeOneBooleanContent.
* GlobalISel: Lower s1 source G_SITOFP/G_UITOFPMatt Arsenault2019-11-156-492/+162
|
* [GISel][ArtifactCombiner] Relax the constraint to combine unmerge with ↵Quentin Colombet2019-11-062-47/+78
| | | | | | | | | | | | | | | concat_vectors The combine G_UNMERGE_VALUES with G_CONCAT_VECTORS used to only be performed when the result type of the G_UNMERGE_VALUES was a vector type. In other words, we were expecting that the G_UNMERGE_VALUES was effectively the exact opposite of the G_CONCAT_VECTORS. Lift that constraint by allowing any G_UNMERGE_VALUES to be combined with any G_CONCAT_VECTORS (as long as the size of the different pieces that we merge/unmerge match). Differential Revision: https://reviews.llvm.org/D69288
* [globalisel] Rename G_GEP to G_PTR_ADDDaniel Sanders2019-11-0532-6783/+6783
| | | | | | | | | | | | | | | | Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734
* AMDGPU/GlobalISel: Legalize FDIV32Austin Kerbow2019-10-291-73/+961
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69581
* GlobalISel: Implement widenScalar for G_INSERT_VECTOR_ELTMatt Arsenault2019-10-251-0/+48
|
* AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHGMatt Arsenault2019-10-253-37/+454
| | | | | | | | Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.
* AMDGPU/GlobalISel: Legalize FDIV16Austin Kerbow2019-10-251-133/+335
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69347
* [GlobalISel][AArch64][AMDGPU][X86] Teach LegalizationArtifactCombiner to ↵Craig Topper2019-10-2410-375/+333
| | | | | | | | combine trunc(g_constant). This allows X86 to properly form shift by immediate instructions since we require an 8-bit constant to match the imported SelectionDAG patterns.
* AMDGPU/GlobalISel: Legalize fast unsafe FDIVAustin Kerbow2019-10-211-0/+798
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69231 llvm-svn: 375460
* AMDGPU: Relax 32-bit SGPR register classMatt Arsenault2019-10-1891-840/+902
| | | | | | | | | | | Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267
* GlobalISel: Implement lower for G_SADDO/G_SSUBOMatt Arsenault2019-10-164-137/+278
| | | | | | | Port directly from SelectionDAG, minus the path using ISD::SADDSAT/ISD::SSUBSAT. llvm-svn: 375042
* [update_mir_test_checks] Handle MI flags properlyRoman Tereshin2019-10-1411-165/+165
| | | | | | | | | | | | | | previously we would generate literal check lines w/ no reg-exps for vregs as MI flags (nsw, ninf, etc.) won't be recognized as a part of MI. Fixing that. Includes updating the MIR tests that suffered from the problem. Reviewed By: bogner Differential Revision: https://reviews.llvm.org/D68905 llvm-svn: 374829
* [GISel] Allow getConstantVRegVal() to return G_FCONSTANT values.Marcello Maggioni2019-10-101-22/+14
| | | | | | | | | | | | | | | | | | | | | In GISel we have both G_CONSTANT and G_FCONSTANT, but because in GISel we don't really have a concept of Float vs Int value the only difference between the two is where the data originates from. What both G_CONSTANT and G_FCONSTANT return is just a bag of bits with the constant representation in it. By making getConstantVRegVal() return G_FCONSTANTs bit representation as well we allow ConstantFold and other things to operate with G_FCONSTANT. Adding tests that show ConstantFolding to work on mixed G_CONSTANT and G_FCONSTANT sources. Differential Revision: https://reviews.llvm.org/D68739 llvm-svn: 374458
* AMDGPU: Use SGPR_128 instead of SReg_128 for vregsMatt Arsenault2019-10-1010-119/+119
| | | | | | | | | SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284
* AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointerMatt Arsenault2019-10-091-0/+46
| | | | | | | | | | This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255
* GlobalISel: Implement fewerElementsVector for G_BUILD_VECTORMatt Arsenault2019-10-0928-672/+1284
| | | | | | Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252
* AMDGPU: Fix i16 arithmetic pattern redundancyMatt Arsenault2019-10-083-36/+12
| | | | | | | | | | | | | | There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092
* AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sourcesMatt Arsenault2019-10-072-146/+575
| | | | llvm-svn: 373989
* AMDGPU/GlobalISel: Handle more G_INSERT casesMatt Arsenault2019-10-071-20/+130
| | | | | | | | | Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947
* GlobalISel: Partially implement lower for G_INSERTMatt Arsenault2019-10-071-6/+148
| | | | llvm-svn: 373946
* AMDGPU/GlobalISel: Fix selection of 16-bit shiftsMatt Arsenault2019-10-073-294/+810
| | | | llvm-svn: 373945
* AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32Matt Arsenault2019-10-071-7/+7
| | | | llvm-svn: 373944
* AMDGPU/GlobalISel: Use S_MOV_B64 for inline constantsMatt Arsenault2019-10-072-11/+12
| | | | | | | This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943
* AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sourcesMatt Arsenault2019-10-077-5451/+11837
| | | | | | Continue making a mess of merge/unmerge legality. llvm-svn: 373942
* AMDGPU/GlobalISel: Select more G_INSERT casesMatt Arsenault2019-10-071-22/+425
| | | | | | | | | | At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938
* GlobalISel: Add target pre-isel instructionsMatt Arsenault2019-10-072-0/+100
| | | | | | | | | | | | | | Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937
* AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsicsMatt Arsenault2019-10-062-0/+116
| | | | llvm-svn: 373840
* AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESSMatt Arsenault2019-10-061-0/+107
| | | | llvm-svn: 373839
* GlobalISel: Partially implement lower for G_EXTRACTMatt Arsenault2019-10-064-24/+213
| | | | | | Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838
* AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsicsMatt Arsenault2019-10-063-25/+11
| | | | | | This wasn't updated for the immarg handling change. llvm-svn: 373837
* AMDGPU/GlobalISel: Fix using wrong addrspace for apertureMatt Arsenault2019-10-041-8/+8
| | | | | | | This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716
* AMDGPU/GlobalISel: Select G_PTRTOINTMatt Arsenault2019-10-041-0/+101
| | | | llvm-svn: 373715
* AMDGPU/GlobalISel: Support wave32 waterfall loopsMatt Arsenault2019-10-0412-389/+704
| | | | llvm-svn: 373714
* AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELTMatt Arsenault2019-10-031-12/+383
| | | | llvm-svn: 373639
OpenPOWER on IntegriCloud