bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU/GFX10: implement ds_ordered_count changes	Nicolai Haehnle	2019-07-01	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: ds_ordered_count can now simultaneously operate on up to 4 dwords in a single instruction, which are taken from (and returned to) lanes 0..3 of a single VGPR. Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276 Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63716 llvm-svn: 364815
*	AMDGPU: Support GDS atomics	Nicolai Haehnle	2019-07-01	1	-0/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814
*	AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap	Matt Arsenault	2019-07-01	2	-0/+142
\| \| \| \|	llvm-svn: 364811
*	AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelane	Matt Arsenault	2019-07-01	1	-0/+98
\| \| \| \|	llvm-svn: 364808
*	AMDGPU/GlobalISel: Complete implementation of G_GEP	Matt Arsenault	2019-07-01	3	-30/+384
\| \| \| \| \| \| \| \|	Also works around tablegen defect in selecting add with unused carry, but if we have to manually select GEP, might as well handle add manually. llvm-svn: 364806
*	AMDGPU/GlobalISel: Select G_PHI	Matt Arsenault	2019-07-01	2	-0/+416
\| \| \| \|	llvm-svn: 364805
*	AMDGPU/GlobalISel: Try to select VOP3 form of add	Matt Arsenault	2019-07-01	1	-13/+26
\| \| \| \| \| \| \| \| \| \| \|	There are several things broken, but at least emit the right thing for gfx9. The import of the pattern with the unused carry out seems to not work. Needs a special class for clamp, because OperandWithDefaultOps doesn't really work. llvm-svn: 364804
*	AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane	Matt Arsenault	2019-07-01	2	-0/+103
\| \| \| \|	llvm-svn: 364801
*	AMDGPU/GlobalISel: Implement select for 32-bit G_ADD	Tom Stellard	2019-07-01	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58804 llvm-svn: 364797
*	AMDGPU/GlobalISel: Select G_BRCOND for vcc	Matt Arsenault	2019-07-01	1	-11/+36
\| \| \| \|	llvm-svn: 364795
*	AMDGPU/GlobalISel: Select G_FRAME_INDEX	Matt Arsenault	2019-07-01	1	-0/+38
\| \| \| \|	llvm-svn: 364789
*	AMDGPU/GFX10: fix scratch resource descriptor	Nicolai Haehnle	2019-07-01	1	-28/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 llvm-svn: 364788
*	AMDGPU/GlobalISel: Make s16 select legal	Matt Arsenault	2019-07-01	4	-70/+244
\| \| \| \| \| \| \|	This is easy to handle and avoids legalization artifacts which are likely to obscure combines. llvm-svn: 364787
*	AMDGPU/GlobalISel: Select G_BRCOND for scc conditions	Matt Arsenault	2019-07-01	2	-0/+194
\| \| \| \|	llvm-svn: 364786
*	AMDGPU/GlobalISel: Tolerate copies with no type set	Matt Arsenault	2019-07-01	1	-0/+56
\| \| \| \| \| \| \|	isVCC has the same bug, but isn't used in a context where it can cause a problem. llvm-svn: 364784
*	AMDGPU: Fix tests using the default alloca address space	Matt Arsenault	2019-07-01	2	-14/+16
\| \| \| \|	llvm-svn: 364783
*	AMDGPU/GlobalISel: Select src modifiers	Matt Arsenault	2019-07-01	2	-30/+191
\| \| \| \|	llvm-svn: 364782
*	AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZE	Matt Arsenault	2019-07-01	1	-0/+35
\| \| \| \|	llvm-svn: 364768
*	AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTOR	Matt Arsenault	2019-07-01	1	-0/+69
\| \| \| \|	llvm-svn: 364767
*	AMDGPU/GlobalISel: Fail on store to 32-bit address space	Matt Arsenault	2019-07-01	1	-3/+3
\| \| \| \|	llvm-svn: 364766
*	AMDGPU/GlobalISel: Improve icmp selection coverage.	Matt Arsenault	2019-07-01	1	-0/+595
\| \| \| \| \| \|	Select s64 eq/ne scalar icmp. llvm-svn: 364765
*	AMDGPU/GlobalISel: RegBankSelect for WWM/WQM	Matt Arsenault	2019-07-01	2	-0/+62
\| \| \| \|	llvm-svn: 364763
*	AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote	Matt Arsenault	2019-07-01	1	-5/+5
\| \| \| \|	llvm-svn: 364762
*	AMDGPU/GlobalISel: Fix scc->vcc copy handling	Matt Arsenault	2019-07-01	1	-26/+88
\| \| \| \| \| \| \| \| \| \| \| \| \|	This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761
*	AMDGPU/GlobalISel: Use and instead of BFE with inline immediate	Matt Arsenault	2019-07-01	3	-4/+119
\| \| \| \| \| \| \|	Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760
*	GlobalISel: Add GINodeEquiv for min/max	Matt Arsenault	2019-07-01	4	-0/+332
\| \| \| \|	llvm-svn: 364759
*	GlobalISel: Add DAG compat for G_FCANONICALIZE	Matt Arsenault	2019-07-01	1	-0/+169
\| \| \| \|	llvm-svn: 364758
*	AMDGPU/GlobalISel: Add some more tests for icmp select	Matt Arsenault	2019-06-29	1	-32/+80
\| \| \| \|	llvm-svn: 364703
*	AMDGPU/GlobalISel: RegBankSelect for update.dpp	Matt Arsenault	2019-06-29	1	-0/+82
\| \| \| \|	llvm-svn: 364701
*	AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.dec	Matt Arsenault	2019-06-29	2	-0/+160
\| \| \| \|	llvm-svn: 364699
*	AMDGPU/GlobalISel: RegBankSelect for some DS intrinsics	Matt Arsenault	2019-06-29	6	-0/+286
\| \| \| \|	llvm-svn: 364698
*	AMDGPU/GlobalISel: RegBankSelect for icmp/fcmp intrinsics	Matt Arsenault	2019-06-29	2	-0/+134
\| \| \| \|	llvm-svn: 364696
*	AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.fmas	Matt Arsenault	2019-06-29	1	-0/+106
\| \| \| \|	llvm-svn: 364695
*	AMDGPU/GlobalISel: RegBankSelect for some simple leaf intrinsics	Matt Arsenault	2019-06-29	6	-0/+84
\| \| \| \|	llvm-svn: 364694
*	AMDGPU: Add baseline test for packed shufflevector	Matt Arsenault	2019-06-28	1	-0/+928
\| \| \| \|	llvm-svn: 364691
*	[AMDGPU][MC] Enabled constant expressions as operands of sendmsg	Dmitry Preobrazhensky	2019-06-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62735 llvm-svn: 364645
*	[AMDGPU] Packed thread ids in function call ABI	Stanislav Mekhanoshin	2019-06-28	3	-53/+152
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D63851 llvm-svn: 364619
*	AMDGPU: Make fixing i1 copies robust against re-ordering	Nicolai Haehnle	2019-06-27	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The new test case led to incorrect code. Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca Reviewers: arsenm, david-salinas Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63871 llvm-svn: 364566
*	[GlobalISel] Accept multiple vregs in lowerFormalArgs	Diana Picus	2019-06-27	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510
*	[AMDGPU] Fix +DumpCode to print an entry label for the first function	Jay Foad	2019-06-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The +DumpCode attribute is a horrible hack in AMDGPU to embed the disassembly of the generated code into the elf file. It is used by LLPC to implement an extension that allows the application to read back the disassembly of the code. It tries to print an entry label at the start of every function, but that didn't work for the first function in the module because DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart which is too late. Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee Reviewers: arsenm, tpr, kzhuravl Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63712 llvm-svn: 364508
*	[AMDGPU] Fix Livereg computation during epilogue insertion	Matt Arsenault	2019-06-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The LivePhysRegs calculated in order to find a scratch register in the epilogue code wrongly uses 'LiveIns'. Instead, it should use the 'Liveout' sets. For the liveness, also considering the operands of the terminator (return) instruction which is the insertion point for the scratch-exec-copy instruction. Patch by Christudasan Devadasan llvm-svn: 364470
*	Update phis in AMDGPUUnifyDivergentExitNodes	Diego Novillo	2019-06-25	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Original patch https://reviews.llvm.org/D63659 from Steven Perron <stevenperron@google.com> The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in the successors of blocks that is splits. This is fixed by calling BasicBlock::splitBasicBlock to split the block instead of doing it manually. This does extra work because a new conditional branch is created in BB which is immediately replaced, but I think the simplicity is worth it. It also helps make the code more future proof in case other things need to be updated. llvm-svn: 364342
*	AMDGPU/GlobalISel: Fix broken test	Matt Arsenault	2019-06-25	1	-3/+3
\| \| \| \|	llvm-svn: 364316
*	AMDGPU/GlobalISel: Fix duplicated test	Matt Arsenault	2019-06-25	2	-187/+0
\| \| \| \| \| \| \|	Somehow ended up with copies of the same tests in AMDGPU and AMDGPU/GlobalISel llvm-svn: 364309
*	AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXT	Matt Arsenault	2019-06-25	3	-0/+545
\| \| \| \|	llvm-svn: 364308
*	AMDGPU: Write LDS objects out as global symbols in code generation	Nicolai Haehnle	2019-06-25	19	-87/+179
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297
*	AMDGPU/GlobalISel: Fix regbankselect for amdgcn.class	Matt Arsenault	2019-06-25	1	-14/+51
\| \| \| \|	llvm-svn: 364262
*	AMDGPU/GlobalISel: Add tests for regbankselect of v2s16 and/or/xor	Matt Arsenault	2019-06-24	3	-0/+195
\| \| \| \|	llvm-svn: 364244
*	AMDGPU/GlobalISel: Select G_TRUNC	Matt Arsenault	2019-06-24	1	-0/+373
\| \| \| \|	llvm-svn: 364215
*	AMDGPU/GlobalISel: RegBankSelect for amdgcn.class	Matt Arsenault	2019-06-24	1	-0/+31
\| \| \| \|	llvm-svn: 364214