summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/GlobalISel
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Improve icmp selection coverage.Matt Arsenault2019-07-011-0/+595
| | | | | | Select s64 eq/ne scalar icmp. llvm-svn: 364765
* AMDGPU/GlobalISel: RegBankSelect for WWM/WQMMatt Arsenault2019-07-012-0/+62
| | | | llvm-svn: 364763
* AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.voteMatt Arsenault2019-07-011-5/+5
| | | | llvm-svn: 364762
* AMDGPU/GlobalISel: Fix scc->vcc copy handlingMatt Arsenault2019-07-011-26/+88
| | | | | | | | | | | | | This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761
* AMDGPU/GlobalISel: Use and instead of BFE with inline immediateMatt Arsenault2019-07-013-4/+119
| | | | | | | Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760
* GlobalISel: Add GINodeEquiv for min/maxMatt Arsenault2019-07-014-0/+332
| | | | llvm-svn: 364759
* GlobalISel: Add DAG compat for G_FCANONICALIZEMatt Arsenault2019-07-011-0/+169
| | | | llvm-svn: 364758
* AMDGPU/GlobalISel: Add some more tests for icmp selectMatt Arsenault2019-06-291-32/+80
| | | | llvm-svn: 364703
* AMDGPU/GlobalISel: RegBankSelect for update.dppMatt Arsenault2019-06-291-0/+82
| | | | llvm-svn: 364701
* AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.decMatt Arsenault2019-06-292-0/+160
| | | | llvm-svn: 364699
* AMDGPU/GlobalISel: RegBankSelect for some DS intrinsicsMatt Arsenault2019-06-296-0/+286
| | | | llvm-svn: 364698
* AMDGPU/GlobalISel: RegBankSelect for icmp/fcmp intrinsicsMatt Arsenault2019-06-292-0/+134
| | | | llvm-svn: 364696
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.fmasMatt Arsenault2019-06-291-0/+106
| | | | llvm-svn: 364695
* AMDGPU/GlobalISel: RegBankSelect for some simple leaf intrinsicsMatt Arsenault2019-06-296-0/+84
| | | | llvm-svn: 364694
* [GlobalISel] Accept multiple vregs in lowerFormalArgsDiana Picus2019-06-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510
* AMDGPU/GlobalISel: Fix broken testMatt Arsenault2019-06-251-3/+3
| | | | llvm-svn: 364316
* AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXTMatt Arsenault2019-06-253-0/+545
| | | | llvm-svn: 364308
* AMDGPU/GlobalISel: Fix regbankselect for amdgcn.classMatt Arsenault2019-06-251-14/+51
| | | | llvm-svn: 364262
* AMDGPU/GlobalISel: Add tests for regbankselect of v2s16 and/or/xorMatt Arsenault2019-06-243-0/+195
| | | | llvm-svn: 364244
* AMDGPU/GlobalISel: Select G_TRUNCMatt Arsenault2019-06-241-0/+373
| | | | llvm-svn: 364215
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.classMatt Arsenault2019-06-241-0/+31
| | | | llvm-svn: 364214
* AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelectMatt Arsenault2019-06-242-12/+87
| | | | | | | | | | | Scalar extends to s64 can use S_BFE_{I64|U64}, but vector extends need to extend to the 32-bit half, and then to 64. I'm not sure what the line should be between what RegBankSelect handles, and what instruction select does, but for now I'm erring on the side of RegBankSelect for future post-RBS combines. llvm-svn: 364212
* AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1Matt Arsenault2019-06-241-27/+133
| | | | | | Try to fail for scc, since I don't think that should ever be produced. llvm-svn: 364199
* AMDGPU/GlobalISel: Fix RegBankSelect for s1 sext/zext/anyextMatt Arsenault2019-06-243-14/+692
| | | | | | | | This needs different handling if the source is known to be a valid condition or not. Handle turning it into shifts or a select during regbankselect. llvm-svn: 364186
* [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.Amara Emerson2019-06-211-6/+4
| | | | | | | | | | | | | | | | | | | | | We sometimes get poor code size because constants of types < 32b are legalized as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the localizer can no longer sink them (although it's possible to extend it to do so). On AArch64 however s8 and s16 constants can be selected in the same way as s32 constants, with a mov pseudo into a W register. If we make s8 and s16 constants legal then we can avoid unnecessary truncates, they can be CSE'd, and the localizer can sink them as normal. There is a caveat: if the user of a smaller constant has to widen the sources, we end up with an anyext of the smaller typed G_CONSTANT. This can cause regressions because of the additional extend and missed pattern matching. To remedy this, there's a new artifact combiner to generate the wider G_CONSTANT if it's legal for the target. Differential Revision: https://reviews.llvm.org/D63587 llvm-svn: 364075
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scaleMatt Arsenault2019-06-181-0/+67
| | | | llvm-svn: 363667
* GlobalISel: Use the original flags when lowering fneg to fsubMatt Arsenault2019-06-172-0/+61
| | | | | | | | | | This was ignoring the flag on fneg, and using the source instruction's flags. Also fixes tests missing from r358702. Note the expansion itself isn't correct without nnan, but that should be fixed separately. llvm-svn: 363637
* GlobalISel: Ignore callsite attributes when picking intrinsic typeMatt Arsenault2019-06-171-0/+21
| | | | | | | | | | | A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. I fixed the same bug in SelectionDAG in r287593. llvm-svn: 363580
* GlobalISel: Verify intrinsicsMatt Arsenault2019-06-171-10/+10
| | | | | | | | I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579
* AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECTTom Stellard2019-06-172-4/+364
| | | | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576
* Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Matt Arsenault2019-06-153-28/+160
| | | | | | | This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478
* Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Mitch Phillips2019-06-143-160/+28
| | | | | | | | | | | This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit c2864c0de07efb5451d32d27a7d4ff2984830929. This reverts rL363410. llvm-svn: 363476
* GlobalISel: Avoid producing Illegal copies in RegBankSelectMatt Arsenault2019-06-143-28/+160
| | | | | | | | | | | | | | | | | | | | | | Avoid producing illegal register bank copies for reg_sequence and phi. The default implementation assumes it is possible to pick any operand's bank and use that for the result, introducing a copy for operands with a different bank. This does not check for illegal copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR operand requires the result to be a VGPR. The changes in getInstrMappingImpl aren't strictly necessary, since AMDGPU now just bypasses this for reg_sequence/phi. This could be replaced with an assert in case other targets run into this. It is currently responsible for producing the error for unsatisfiable copies, but this will be better served with a verifier check. For phis, for now assume any undetermined operands must be VGPRs. Eventually, this needs to be able to defer mapping these operations. This also does not yet have a way to check for whether the block is in a divergent region. llvm-svn: 363410
* [AMDGPU] more gfx1010 tests. NFC.Stanislav Mekhanoshin2019-06-122-10/+10
| | | | llvm-svn: 363190
* AMDGPU/GlobalISel: Fix using illegal situations in testsMatt Arsenault2019-06-122-28/+25
| | | | | | | These were using illegal copies as the side effecting use, so make them legal. llvm-svn: 363168
* AMDGPU/GlobalISel: Add wave scratch offset argumentMatt Arsenault2019-05-301-0/+10
| | | | | | Avoids crashing in PEI in a future change. llvm-svn: 362136
* AMDGPU/GlobalISel: Remove unnecesssary REQUIREsMatt Arsenault2019-05-299-18/+3
| | | | | | This has been a mandatory part of the build for a while. llvm-svn: 361956
* AMDGPU/GlobalISel: Legality for integer min/maxMatt Arsenault2019-05-238-0/+1964
| | | | llvm-svn: 361519
* AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFPMatt Arsenault2019-05-172-0/+40
| | | | llvm-svn: 361082
* GlobalISel: Implement lower for S64->S32 [SU]ITOFPMatt Arsenault2019-05-172-0/+115
| | | | | | | | | | | | | This is ported from the custom AMDGPU DAG implementation. I think this is a better default expansion than what the DAG currently uses, at least if the target has CTLZ. This implements the signed version in terms of the unsigned conversion, which is implemented with bit operations. SelectionDAG has several other implementations that should eventually be ported depending on what instructions are legal. llvm-svn: 361081
* AMDGPU/GlobalISel: Legalize G_FCEILMatt Arsenault2019-05-171-0/+275
| | | | llvm-svn: 361028
* AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNCMatt Arsenault2019-05-171-22/+208
| | | | llvm-svn: 361027
* AMDGPU/GlobalISel: Legalize G_FRINTMatt Arsenault2019-05-171-0/+160
| | | | llvm-svn: 361026
* AMDGPU/GlobalISel: Legalize G_FCOPYSIGNMatt Arsenault2019-05-171-0/+479
| | | | llvm-svn: 361025
* AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.loadMatt Arsenault2019-05-171-0/+151
| | | | llvm-svn: 361023
* AMDGPU/GlobalISel: Use subreg index instead of extra unmergeMatt Arsenault2019-05-171-24/+16
| | | | | | | This saves instructions and extra steps, but I'm not sure about introducing subregister indexes at this point. llvm-svn: 361022
* AMDGPU/GlobalISel: Use waterfall loop for buffer_loadMatt Arsenault2019-05-172-6/+295
| | | | | | | This adds support for more complex waterfall loops that need to handle operands > 32-bits, and multiple operands. llvm-svn: 361021
* AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xorMatt Arsenault2019-05-163-42/+42
| | | | | | Bool values should use the scc/vcc regbank since r350611. llvm-svn: 360877
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-302-6/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sourcesMatt Arsenault2019-04-221-0/+54
| | | | llvm-svn: 358894
OpenPOWER on IntegriCloud