summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: RegBankSelect for update.dppMatt Arsenault2019-06-291-0/+1
| | | | llvm-svn: 364701
* AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.decMatt Arsenault2019-06-291-0/+2
| | | | llvm-svn: 364699
* AMDGPU/GlobalISel: RegBankSelect for some DS intrinsicsMatt Arsenault2019-06-291-1/+17
| | | | llvm-svn: 364698
* AMDGPU/GlobalISel: RegBankSelect for some easy intrinsicsMatt Arsenault2019-06-291-1/+48
| | | | llvm-svn: 364697
* AMDGPU/GlobalISel: RegBankSelect for icmp/fcmp intrinsicsMatt Arsenault2019-06-291-0/+12
| | | | llvm-svn: 364696
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.fmasMatt Arsenault2019-06-291-0/+1
| | | | llvm-svn: 364695
* AMDGPU/GlobalISel: RegBankSelect for some simple leaf intrinsicsMatt Arsenault2019-06-291-1/+11
| | | | llvm-svn: 364694
* [AMDGPU][MC] Fix 2 for sanitizer failure in 364645Dmitry Preobrazhensky2019-06-282-6/+6
| | | | llvm-svn: 364656
* [AMDGPU][MC] Fix for sanitizer failure in 364645Dmitry Preobrazhensky2019-06-281-4/+10
| | | | llvm-svn: 364651
* [AMDGPU][MC] Enabled constant expressions as operands of sendmsgDmitry Preobrazhensky2019-06-285-210/+266
| | | | | | | | | | See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62735 llvm-svn: 364645
* [AMDGPU] Packed thread ids in function call ABIStanislav Mekhanoshin2019-06-284-22/+132
| | | | | | Differential Revision: https://reviews.llvm.org/D63851 llvm-svn: 364619
* AMDGPU/GlobalISel: Convert to using RegisterMatt Arsenault2019-06-284-44/+44
| | | | llvm-svn: 364616
* AMDGPU: Make fixing i1 copies robust against re-orderingNicolai Haehnle2019-06-271-10/+11
| | | | | | | | | | | | | | | | | Summary: The new test case led to incorrect code. Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca Reviewers: arsenm, david-salinas Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63871 llvm-svn: 364566
* [GlobalISel] Accept multiple vregs in lowerFormalArgsDiana Picus2019-06-272-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510
* [AMDGPU] Fix +DumpCode to print an entry label for the first functionJay Foad2019-06-271-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The +DumpCode attribute is a horrible hack in AMDGPU to embed the disassembly of the generated code into the elf file. It is used by LLPC to implement an extension that allows the application to read back the disassembly of the code. It tries to print an entry label at the start of every function, but that didn't work for the first function in the module because DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart which is too late. Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee Reviewers: arsenm, tpr, kzhuravl Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63712 llvm-svn: 364508
* AMDGPU: Assert SPAdj is 0Matt Arsenault2019-06-261-0/+2
| | | | llvm-svn: 364473
* [AMDGPU] Fix Livereg computation during epilogue insertionMatt Arsenault2019-06-261-1/+2
| | | | | | | | | | | | The LivePhysRegs calculated in order to find a scratch register in the epilogue code wrongly uses 'LiveIns'. Instead, it should use the 'Liveout' sets. For the liveness, also considering the operands of the terminator (return) instruction which is the insertion point for the scratch-exec-copy instruction. Patch by Christudasan Devadasan llvm-svn: 364470
* [AMDGPU] Fix for branch offset hardware workaroundRyan Taylor2019-06-267-24/+111
| | | | | | | | | | | | | | | | | | | Summary: This fixes a hardware bug that makes a branch offset of 0x3f unsafe. This replaces the 32 bit branch with offset 0x3f to a 64 bit instruction that includes the same 32 bit branch and the encoding for a s_nop 0 to follow. The relaxer than modifies the offsets accordingly. Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63494 llvm-svn: 364451
* AMDGPU: Fix unused variableMatt Arsenault2019-06-261-1/+0
| | | | llvm-svn: 364426
* AMDGPU: Check MRI for callee saved regs instead of TRIMatt Arsenault2019-06-264-7/+5
| | | | | | | This should the same, but MRI does allow dynamically changing the CSR set, although currently not used. llvm-svn: 364425
* Don't look for the TargetFrameLowering in the implementationMatt Arsenault2019-06-251-2/+1
| | | | | | The same oddity was apparently copy-pasted between multiple targets. llvm-svn: 364349
* Update phis in AMDGPUUnifyDivergentExitNodesDiego Novillo2019-06-251-7/+4
| | | | | | | | | | | | | | | Original patch https://reviews.llvm.org/D63659 from Steven Perron <stevenperron@google.com> The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in the successors of blocks that is splits. This is fixed by calling BasicBlock::splitBasicBlock to split the block instead of doing it manually. This does extra work because a new conditional branch is created in BB which is immediately replaced, but I think the simplicity is worth it. It also helps make the code more future proof in case other things need to be updated. llvm-svn: 364342
* [AMDGPU] Removed dead SIMachineFunctionInfo::getWorkItemIDVGPR()Stanislav Mekhanoshin2019-06-252-20/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D63780 llvm-svn: 364339
* [AMDGPU] Null checking on TS to avoid crashing in clang tests.Michael Liao2019-06-251-1/+2
| | | | | | | - `test/Misc/backend-resource-limit-diagnostics.cl` crashes as null streamer is used. llvm-svn: 364318
* AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXTMatt Arsenault2019-06-253-5/+135
| | | | llvm-svn: 364308
* AMDGPU: Write LDS objects out as global symbols in code generationNicolai Haehnle2019-06-259-14/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297
* AMDGPU/MC: Add .amdgpu_lds directiveNicolai Haehnle2019-06-253-0/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The directive defines a symbol as an group/local memory (LDS) symbol. LDS symbols behave similar to common symbols for the purposes of ELF, using the processor-specific SHN_AMDGPU_LDS as section index. It is the linker and/or runtime loader's job to "instantiate" LDS symbols and resolve relocations that reference them. It is not possible to initialize LDS memory (not even zero-initialize as for .bss). We want to be able to link together objects -- starting with relocatable objects, but possible expanding to shared objects in the future -- that access LDS memory in a flexible way. LDS memory is in an address space that is entirely separate from the address space that contains the program image (code and normal data), so having program segments for it doesn't really make sense. Furthermore, we want to be able to compile multiple kernels in a compilation unit which have disjoint use of LDS memory. In that case, we may want to place LDS symbols differently for different kernels to save memory (LDS memory is very limited and physically private to each kernel invocation), so we can't simply place LDS symbols in a .lds section. Hence this solution where LDS symbols always stay undefined. Change-Id: I08cbc37a7c0c32f53f7b6123aa0afc91dbc1748f Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61493 llvm-svn: 364296
* AMDGPU/GlobalISel: Fix regbankselect for amdgcn.classMatt Arsenault2019-06-251-4/+8
| | | | llvm-svn: 364262
* AMDGPU/GlobalISel: Select G_TRUNCMatt Arsenault2019-06-244-24/+115
| | | | llvm-svn: 364215
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.classMatt Arsenault2019-06-241-0/+9
| | | | llvm-svn: 364214
* AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelectMatt Arsenault2019-06-241-13/+57
| | | | | | | | | | | Scalar extends to s64 can use S_BFE_{I64|U64}, but vector extends need to extend to the 32-bit half, and then to 64. I'm not sure what the line should be between what RegBankSelect handles, and what instruction select does, but for now I'm erring on the side of RegBankSelect for future post-RBS combines. llvm-svn: 364212
* [AMDGPU] Allow any value in unused src0 field in v_nopTim Renouf2019-06-241-1/+1
| | | | | | | | | | | | | Summary: The LLVM disassembler assumes that the unused src0 operand of v_nop is zero. Other tools can put another value in that field, which is still valid. This commit fixes the LLVM disassembler to recognize such an encoding as v_nop, in the same way as we already do for s_getpc. Differential Revision: https://reviews.llvm.org/D63724 Change-Id: Iaf0363eae26ff92fc4ebc716216476adbff37a6f llvm-svn: 364208
* AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1Matt Arsenault2019-06-242-9/+27
| | | | | | Try to fail for scc, since I don't think that should ever be produced. llvm-svn: 364199
* GlobalISel: Remove unsigned variant of SrcOpMatt Arsenault2019-06-244-29/+29
| | | | | | | | | Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194
* CodeGen: Introduce a class for registersMatt Arsenault2019-06-2412-42/+43
| | | | | | | | | Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191
* [AMDGPU] Remove unused variable AllSGPRSpilledToVGPRs. NFCBjorn Pettersson2019-06-241-5/+1
| | | | | | | | | | | | | | | | | | | | | Summary: Removing the unused variable AllSGPRSpilledToVGPRs in SIFrameLowering::processFunctionBeforeFrameFinalized to avoid error: variable 'AllSGPRSpilledToVGPRs' set but not used [-Werror=unused-but-set-variable] Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63721 llvm-svn: 364190
* AMDGPU/GlobalISel: Fix RegBankSelect for s1 sext/zext/anyextMatt Arsenault2019-06-241-10/+76
| | | | | | | | This needs different handling if the source is known to be a valid condition or not. Handle turning it into shifts or a select during regbankselect. llvm-svn: 364186
* AMDGPU: Fold frame index into MUBUFMatt Arsenault2019-06-242-10/+49
| | | | | | | | | | | | | | | | This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185
* AMDGPU: Cleanup checking when spills need emergency slotsMatt Arsenault2019-06-241-7/+6
| | | | | | Address fixme, which should no longer be a problem since r363757. llvm-svn: 364182
* AMDGPU: Fix not using s33 for scratch wave offset in kernelsMatt Arsenault2019-06-211-7/+11
| | | | | | Fixes missing piece from r363990. llvm-svn: 364099
* [AMDGPU] hazard recognizer for fp atomic to s_denorm_modeStanislav Mekhanoshin2019-06-219-28/+112
| | | | | | | | | This requires 3 wait states unless there is a wait or VALU in between. Differential Revision: https://reviews.llvm.org/D63619 llvm-svn: 364074
* AMDGPU: Always use s33 for global scratch wave offsetMatt Arsenault2019-06-202-9/+1
| | | | | | | | | Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990
* AMDGPU: Add intrinsics for DS GWS semaphore instructionsMatt Arsenault2019-06-205-25/+72
| | | | llvm-svn: 363983
* AMDGPU: Insert mem_viol check loop around GWS pre-GFX9Matt Arsenault2019-06-205-19/+129
| | | | | | | It is necessary to emit this loop around GWS operations in case the wave is preempted pre-GFX9. llvm-svn: 363979
* AMDGPU: Fix ignoring DisableFramePointerElim in leaf functionsMatt Arsenault2019-06-201-11/+7
| | | | | | | | The attribute can specify elimination for leaf or non-leaf, so it should always be considered. I copied this bug from AArch64, which probably should also be fixed. llvm-svn: 363949
* AMDGPU: Treat undef as an inline immediateMatt Arsenault2019-06-202-5/+19
| | | | | | | This should only matter in vectors with an undef component, since a full undef vector would have been folded out. llvm-svn: 363941
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-2010-40/+56
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* AMDGPU: Don't clobber VCC in MUBUF addr64 emulationMatt Arsenault2019-06-201-9/+16
| | | | | | | | | Introducing VCC defs during SIFixSGPRCopies is generally problematic. Avoid it by starting with the VOP3 form with the general condition register. This is the easiest to fix instance, but doesn't solve any specific problems I'm looking at. llvm-svn: 363904
* AMDGPU: Consolidate some getGeneration checksMatt Arsenault2019-06-199-31/+82
| | | | | | | | This is incomplete, and ideally these would all be removed, but it's better to localize them to the subtarget first with comments about what they're for. llvm-svn: 363902
* AMDGPU: Undo sub x, c canonicalization for v2i16Matt Arsenault2019-06-193-26/+87
| | | | | | Should avoid regression from D62341 llvm-svn: 363899
OpenPOWER on IntegriCloud