summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Propagate undef flag during pre-RA exec mask optimizationsNicolai Haehnle2019-10-081-1/+24
| | | | | | | | | | | | | | Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68184 llvm-svn: 374041
* MachineSSAUpdater: insert IMPLICIT_DEF at top of basic blockNicolai Haehnle2019-10-081-0/+28
| | | | | | | | | | | | | | | | | | | | | Summary: When getValueInMiddleOfBlock happens to be called for a basic block that has no incoming value at all, an IMPLICIT_DEF is inserted in that block via GetValueAtEndOfBlockInternal. This IMPLICIT_DEF must be at the top of its basic block or it will likely not reach the use that the caller intends to insert. Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68183 llvm-svn: 374040
* AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sourcesMatt Arsenault2019-10-072-146/+575
| | | | llvm-svn: 373989
* AMDGPU/GlobalISel: Handle more G_INSERT casesMatt Arsenault2019-10-071-20/+130
| | | | | | | | | Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947
* GlobalISel: Partially implement lower for G_INSERTMatt Arsenault2019-10-071-6/+148
| | | | llvm-svn: 373946
* AMDGPU/GlobalISel: Fix selection of 16-bit shiftsMatt Arsenault2019-10-073-294/+810
| | | | llvm-svn: 373945
* AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32Matt Arsenault2019-10-071-7/+7
| | | | llvm-svn: 373944
* AMDGPU/GlobalISel: Use S_MOV_B64 for inline constantsMatt Arsenault2019-10-072-11/+12
| | | | | | | This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943
* AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sourcesMatt Arsenault2019-10-077-5451/+11837
| | | | | | Continue making a mess of merge/unmerge legality. llvm-svn: 373942
* AMDGPU/GlobalISel: Select more G_INSERT casesMatt Arsenault2019-10-071-22/+425
| | | | | | | | | | At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938
* GlobalISel: Add target pre-isel instructionsMatt Arsenault2019-10-072-0/+100
| | | | | | | | | | | | | | Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937
* [AMDGPU] Fix test checksJay Foad2019-10-071-2/+4
| | | | | | | | The GFX10-DENORM-STRICT checks were only passing by accident. Fix them to make the test more robust in the face of scheduling or register allocation changes. llvm-svn: 373893
* AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsicsMatt Arsenault2019-10-062-0/+116
| | | | llvm-svn: 373840
* AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESSMatt Arsenault2019-10-061-0/+107
| | | | llvm-svn: 373839
* GlobalISel: Partially implement lower for G_EXTRACTMatt Arsenault2019-10-064-24/+213
| | | | | | Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838
* AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsicsMatt Arsenault2019-10-063-25/+11
| | | | | | This wasn't updated for the immarg handling change. llvm-svn: 373837
* AMDGPU/GlobalISel: Fix using wrong addrspace for apertureMatt Arsenault2019-10-041-8/+8
| | | | | | | This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716
* AMDGPU/GlobalISel: Select G_PTRTOINTMatt Arsenault2019-10-041-0/+101
| | | | llvm-svn: 373715
* AMDGPU/GlobalISel: Support wave32 waterfall loopsMatt Arsenault2019-10-0412-389/+704
| | | | llvm-svn: 373714
* AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELTMatt Arsenault2019-10-031-12/+383
| | | | llvm-svn: 373639
* AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelectMatt Arsenault2019-10-031-6/+114
| | | | | | | | Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638
* AMDGPU/GlobalISel: Allow VGPR to index SGPR registerMatt Arsenault2019-10-031-3/+2
| | | | | | | | We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637
* AMDGPU/GlobalISel: Add some more tests for G_INSERT legalizationMatt Arsenault2019-10-031-0/+168
| | | | llvm-svn: 373636
* AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 andMatt Arsenault2019-10-031-0/+166
| | | | | | This would try to do FewerElements to v9s8 llvm-svn: 373635
* AMDGPU/GlobalISel: Expand G_BITCAST legalityMatt Arsenault2019-10-031-0/+102
| | | | llvm-svn: 373567
* [AMDGPU] Fix illegal agpr use by VALUStanislav Mekhanoshin2019-10-022-3/+21
| | | | | | | | | | | | | | | | | | | When SIFixSGPRCopies attempts to fix an illegal copy from vector to scalar register it calls moveToVALU(). A copy from an agpr to sgpr becomes a copy from agpr to agpr, which may result in the illegal register class at a use of this copy. Solution is to copy it always into a vgpr. This may result in a subsequent copy into an agpr if that is what really needed, however should not happen too often and likely will be folded later. The opposite situation may not happen because an sgpr is always illegal where agpr is legal, so such user instructions may not exist. Differential Revision: https://reviews.llvm.org/D68358 llvm-svn: 373544
* [AMDGPU] Extend buffer intrinsics with swizzlingPiotr Sobczak2019-10-0256-454/+525
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
* AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEXMatt Arsenault2019-10-021-1/+1
| | | | | | | | | In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415
* AMDGPU/GlobalISel: Private loads always use VGPRsMatt Arsenault2019-10-021-0/+17
| | | | llvm-svn: 373414
* AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTORMatt Arsenault2019-10-022-40/+155
| | | | | | This will be needed to support AGPR operations. llvm-svn: 373413
* AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit valuesMatt Arsenault2019-10-021-0/+28
| | | | llvm-svn: 373412
* [AMDGPU] separate accounting for agprsStanislav Mekhanoshin2019-10-021-10/+129
| | | | | | | | | Account and report agprs separately on gfx908. Other targets do not change the reporting. Differential Revision: https://reviews.llvm.org/D68307 llvm-svn: 373411
* AMDGPU: Fix an out of date assert in addressing FrameIndexChangpeng Fang2019-10-011-0/+66
| | | | | | | | | | Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404
* AMDGPU/GlobalISel: Increase max legal size to 1024Matt Arsenault2019-10-018-84/+440
| | | | | | | | There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350
* Revert "GlobalISel: Handle llvm.read_register"Dmitri Gribenko2019-10-012-10/+8
| | | | | | | | This reverts commit r373294. It broke Clang's CodeGen/arm64-microsoft-status-reg.cpp: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/18483 llvm-svn: 373310
* AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFPMatt Arsenault2019-10-012-44/+515
| | | | llvm-svn: 373298
* AMDGPU/GlobalISel: Add support for init.exec intrinsicsMatt Arsenault2019-10-015-32/+39
| | | | | | | TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296
* GlobalISel: Handle llvm.read_registerMatt Arsenault2019-10-012-8/+10
| | | | | | | | | | | | | SelectionDAG has a bunch of machinery to defer this to selection time for some reason. Just directly emit a copy during IRTranslator. The x86 usage does somewhat questionably check hasFP, which could depend on the whole function being at minimum translated. This does lose the convergent bit if the callsite had it, which may be a problem. We also lose that in general for intrinsics, which may also be a problem. llvm-svn: 373294
* AMDGPU/GlobalISel: Avoid creating shift of 0 in arg loweringMatt Arsenault2019-10-011-1/+1
| | | | | | | | This is sort of papering over the fact that we don't run a combiner anywhere, but avoiding creating 2 instructions in the first place is easy. llvm-svn: 373293
* AMDGPU/GlobalISel: Select G_UADDO/G_USUBOMatt Arsenault2019-10-012-0/+394
| | | | llvm-svn: 373288
* GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sourcesMatt Arsenault2019-10-012-46/+232
| | | | | | Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287
* AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUEMatt Arsenault2019-10-011-0/+156
| | | | | | | Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286
* [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definitionAlexander Timofeev2019-09-302-167/+200
| | | | | | | | Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67662 llvm-svn: 373221
* [TargetLowering] Simplify expansion of S{ADD,SUB}ORoger Ferrer Ibanez2019-09-301-296/+182
| | | | | | | | | | ISD::SADDO uses the suggested sequence described in the section ยง2.4 of the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for (non-zero) positive. Differential Revision: https://reviews.llvm.org/D47927 llvm-svn: 373187
* AMDGPU/GlobalISel: Fix select for v2s16 and/or/xorMatt Arsenault2019-09-303-45/+45
| | | | llvm-svn: 373180
* [AMDGPU] Improve fma.f64 test. NFC.Stanislav Mekhanoshin2019-09-251-1/+154
| | | | llvm-svn: 372908
* [AMDGPU] gfx10 v_fmac_f16 operand foldingStanislav Mekhanoshin2019-09-251-5/+5
| | | | | | | | Fold immediates into v_fmac_f16. Differential Revision: https://reviews.llvm.org/D68037 llvm-svn: 372906
* AMDGPU/GlobalISel: Allow selection of scalar min/maxMatt Arsenault2019-09-214-40/+20
| | | | | | | | | I believe all of the uniform/divergent pattern predicates are redundant and can be removed. The uniformity bit already influences the register class, and nothhing has broken when I've removed this and others. llvm-svn: 372450
* Remove assert from MachineLoop::getLoopPredecessor()Stanislav Mekhanoshin2019-09-201-0/+92
| | | | | | | | | | | | | | | | | | According to the documentation method returns predecessor if the given loop's header has exactly one unique predecessor outside the loop. Otherwise return null. In reality it asserts if there is no predecessor outside of the loop. The testcase has the loop where predecessors outside of the loop were not identified as analyzeBranch() was unable to process the mask branch and returned true. That is also not correct to assert for the truly dead loops. Differential Revision: https://reviews.llvm.org/D67634 llvm-svn: 372405
* Revert r372366 "Use getTargetConstant for BLENDI, and add a test to catch it."Nico Weber2019-09-201-14/+0
| | | | | | | | | | | | | | | | | | | This reverts commit 52621307bcab2013e8833f3317cebd63a6db3885. Tests have been failing all night with [0/2] ACTION //llvm/test:check-llvm(//llvm/utils/gn/build/toolchain:unix) -- Testing: 33647 tests, 64 threads -- Testing: 0 .. 10.. UNRESOLVED: LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll (6943 of 33647) ******************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll' FAILED ******************** Test has no run line! ******************** Since there were other concerns on https://reviews.llvm.org/D67785, I'm just reverting for now. llvm-svn: 372383
OpenPOWER on IntegriCloud