bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU: Propagate undef flag during pre-RA exec mask optimizations	Nicolai Haehnle	2019-10-08	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68184 llvm-svn: 374041
*	MachineSSAUpdater: insert IMPLICIT_DEF at top of basic block	Nicolai Haehnle	2019-10-08	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When getValueInMiddleOfBlock happens to be called for a basic block that has no incoming value at all, an IMPLICIT_DEF is inserted in that block via GetValueAtEndOfBlockInternal. This IMPLICIT_DEF must be at the top of its basic block or it will likely not reach the use that the caller intends to insert. Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68183 llvm-svn: 374040
*	AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sources	Matt Arsenault	2019-10-07	2	-146/+575
\| \| \| \|	llvm-svn: 373989
*	AMDGPU/GlobalISel: Handle more G_INSERT cases	Matt Arsenault	2019-10-07	1	-20/+130
\| \| \| \| \| \| \| \| \|	Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947
*	GlobalISel: Partially implement lower for G_INSERT	Matt Arsenault	2019-10-07	1	-6/+148
\| \| \| \|	llvm-svn: 373946
*	AMDGPU/GlobalISel: Fix selection of 16-bit shifts	Matt Arsenault	2019-10-07	3	-294/+810
\| \| \| \|	llvm-svn: 373945
*	AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32	Matt Arsenault	2019-10-07	1	-7/+7
\| \| \| \|	llvm-svn: 373944
*	AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants	Matt Arsenault	2019-10-07	2	-11/+12
\| \| \| \| \| \| \|	This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943
*	AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sources	Matt Arsenault	2019-10-07	7	-5451/+11837
\| \| \| \| \| \|	Continue making a mess of merge/unmerge legality. llvm-svn: 373942
*	AMDGPU/GlobalISel: Select more G_INSERT cases	Matt Arsenault	2019-10-07	1	-22/+425
\| \| \| \| \| \| \| \| \| \|	At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938
*	GlobalISel: Add target pre-isel instructions	Matt Arsenault	2019-10-07	2	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937
*	[AMDGPU] Fix test checks	Jay Foad	2019-10-07	1	-2/+4
\| \| \| \| \| \| \| \|	The GFX10-DENORM-STRICT checks were only passing by accident. Fix them to make the test more robust in the face of scheduling or register allocation changes. llvm-svn: 373893
*	AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics	Matt Arsenault	2019-10-06	2	-0/+116
\| \| \| \|	llvm-svn: 373840
*	AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS	Matt Arsenault	2019-10-06	1	-0/+107
\| \| \| \|	llvm-svn: 373839
*	GlobalISel: Partially implement lower for G_EXTRACT	Matt Arsenault	2019-10-06	4	-24/+213
\| \| \| \| \| \|	Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838
*	AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics	Matt Arsenault	2019-10-06	3	-25/+11
\| \| \| \| \| \|	This wasn't updated for the immarg handling change. llvm-svn: 373837
*	AMDGPU/GlobalISel: Fix using wrong addrspace for aperture	Matt Arsenault	2019-10-04	1	-8/+8
\| \| \| \| \| \| \|	This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716
*	AMDGPU/GlobalISel: Select G_PTRTOINT	Matt Arsenault	2019-10-04	1	-0/+101
\| \| \| \|	llvm-svn: 373715
*	AMDGPU/GlobalISel: Support wave32 waterfall loops	Matt Arsenault	2019-10-04	12	-389/+704
\| \| \| \|	llvm-svn: 373714
*	AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT	Matt Arsenault	2019-10-03	1	-12/+383
\| \| \| \|	llvm-svn: 373639
*	AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect	Matt Arsenault	2019-10-03	1	-6/+114
\| \| \| \| \| \| \| \|	Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638
*	AMDGPU/GlobalISel: Allow VGPR to index SGPR register	Matt Arsenault	2019-10-03	1	-3/+2
\| \| \| \| \| \| \| \|	We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637
*	AMDGPU/GlobalISel: Add some more tests for G_INSERT legalization	Matt Arsenault	2019-10-03	1	-0/+168
\| \| \| \|	llvm-svn: 373636
*	AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and	Matt Arsenault	2019-10-03	1	-0/+166
\| \| \| \| \| \|	This would try to do FewerElements to v9s8 llvm-svn: 373635
*	AMDGPU/GlobalISel: Expand G_BITCAST legality	Matt Arsenault	2019-10-03	1	-0/+102
\| \| \| \|	llvm-svn: 373567
*	[AMDGPU] Fix illegal agpr use by VALU	Stanislav Mekhanoshin	2019-10-02	2	-3/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When SIFixSGPRCopies attempts to fix an illegal copy from vector to scalar register it calls moveToVALU(). A copy from an agpr to sgpr becomes a copy from agpr to agpr, which may result in the illegal register class at a use of this copy. Solution is to copy it always into a vgpr. This may result in a subsequent copy into an agpr if that is what really needed, however should not happen too often and likely will be folded later. The opposite situation may not happen because an sgpr is always illegal where agpr is legal, so such user instructions may not exist. Differential Revision: https://reviews.llvm.org/D68358 llvm-svn: 373544
*	[AMDGPU] Extend buffer intrinsics with swizzling	Piotr Sobczak	2019-10-02	56	-454/+525
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
*	AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEX	Matt Arsenault	2019-10-02	1	-1/+1
\| \| \| \| \| \| \| \| \|	In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415
*	AMDGPU/GlobalISel: Private loads always use VGPRs	Matt Arsenault	2019-10-02	1	-0/+17
\| \| \| \|	llvm-svn: 373414
*	AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTOR	Matt Arsenault	2019-10-02	2	-40/+155
\| \| \| \| \| \|	This will be needed to support AGPR operations. llvm-svn: 373413
*	AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit values	Matt Arsenault	2019-10-02	1	-0/+28
\| \| \| \|	llvm-svn: 373412
*	[AMDGPU] separate accounting for agprs	Stanislav Mekhanoshin	2019-10-02	1	-10/+129
\| \| \| \| \| \| \| \| \|	Account and report agprs separately on gfx908. Other targets do not change the reporting. Differential Revision: https://reviews.llvm.org/D68307 llvm-svn: 373411
*	AMDGPU: Fix an out of date assert in addressing FrameIndex	Changpeng Fang	2019-10-01	1	-0/+66
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404
*	AMDGPU/GlobalISel: Increase max legal size to 1024	Matt Arsenault	2019-10-01	8	-84/+440
\| \| \| \| \| \| \| \|	There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350
*	Revert "GlobalISel: Handle llvm.read_register"	Dmitri Gribenko	2019-10-01	2	-10/+8
\| \| \| \| \| \| \| \|	This reverts commit r373294. It broke Clang's CodeGen/arm64-microsoft-status-reg.cpp: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/18483 llvm-svn: 373310
*	AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP	Matt Arsenault	2019-10-01	2	-44/+515
\| \| \| \|	llvm-svn: 373298
*	AMDGPU/GlobalISel: Add support for init.exec intrinsics	Matt Arsenault	2019-10-01	5	-32/+39
\| \| \| \| \| \| \|	TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296
*	GlobalISel: Handle llvm.read_register	Matt Arsenault	2019-10-01	2	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAG has a bunch of machinery to defer this to selection time for some reason. Just directly emit a copy during IRTranslator. The x86 usage does somewhat questionably check hasFP, which could depend on the whole function being at minimum translated. This does lose the convergent bit if the callsite had it, which may be a problem. We also lose that in general for intrinsics, which may also be a problem. llvm-svn: 373294
*	AMDGPU/GlobalISel: Avoid creating shift of 0 in arg lowering	Matt Arsenault	2019-10-01	1	-1/+1
\| \| \| \| \| \| \| \|	This is sort of papering over the fact that we don't run a combiner anywhere, but avoiding creating 2 instructions in the first place is easy. llvm-svn: 373293
*	AMDGPU/GlobalISel: Select G_UADDO/G_USUBO	Matt Arsenault	2019-10-01	2	-0/+394
\| \| \| \|	llvm-svn: 373288
*	GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sources	Matt Arsenault	2019-10-01	2	-46/+232
\| \| \| \| \| \|	Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287
*	AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUE	Matt Arsenault	2019-10-01	1	-0/+156
\| \| \| \| \| \| \|	Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286
*	[AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition	Alexander Timofeev	2019-09-30	2	-167/+200
\| \| \| \| \| \| \| \|	Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67662 llvm-svn: 373221
*	[TargetLowering] Simplify expansion of S{ADD,SUB}O	Roger Ferrer Ibanez	2019-09-30	1	-296/+182
\| \| \| \| \| \| \| \| \| \|	ISD::SADDO uses the suggested sequence described in the section §2.4 of the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for (non-zero) positive. Differential Revision: https://reviews.llvm.org/D47927 llvm-svn: 373187
*	AMDGPU/GlobalISel: Fix select for v2s16 and/or/xor	Matt Arsenault	2019-09-30	3	-45/+45
\| \| \| \|	llvm-svn: 373180
*	[AMDGPU] Improve fma.f64 test. NFC.	Stanislav Mekhanoshin	2019-09-25	1	-1/+154
\| \| \| \|	llvm-svn: 372908
*	[AMDGPU] gfx10 v_fmac_f16 operand folding	Stanislav Mekhanoshin	2019-09-25	1	-5/+5
\| \| \| \| \| \| \| \|	Fold immediates into v_fmac_f16. Differential Revision: https://reviews.llvm.org/D68037 llvm-svn: 372906
*	AMDGPU/GlobalISel: Allow selection of scalar min/max	Matt Arsenault	2019-09-21	4	-40/+20
\| \| \| \| \| \| \| \| \|	I believe all of the uniform/divergent pattern predicates are redundant and can be removed. The uniformity bit already influences the register class, and nothhing has broken when I've removed this and others. llvm-svn: 372450
*	Remove assert from MachineLoop::getLoopPredecessor()	Stanislav Mekhanoshin	2019-09-20	1	-0/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the documentation method returns predecessor if the given loop's header has exactly one unique predecessor outside the loop. Otherwise return null. In reality it asserts if there is no predecessor outside of the loop. The testcase has the loop where predecessors outside of the loop were not identified as analyzeBranch() was unable to process the mask branch and returned true. That is also not correct to assert for the truly dead loops. Differential Revision: https://reviews.llvm.org/D67634 llvm-svn: 372405
*	Revert r372366 "Use getTargetConstant for BLENDI, and add a test to catch it."	Nico Weber	2019-09-20	1	-14/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 52621307bcab2013e8833f3317cebd63a6db3885. Tests have been failing all night with [0/2] ACTION //llvm/test:check-llvm(//llvm/utils/gn/build/toolchain:unix) -- Testing: 33647 tests, 64 threads -- Testing: 0 .. 10.. UNRESOLVED: LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll (6943 of 33647) ****************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll' FAILED **************** Test has no run line! ****************** Since there were other concerns on https://reviews.llvm.org/D67785, I'm just reverting for now. llvm-svn: 372383