bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU: Use gfx9 carry-less add/sub instructions	Matt Arsenault	2017-11-30	1	-4/+4
\| \| \| \|	llvm-svn: 319491
*	AMDGPU: Select DS insts without m0 initialization	Matt Arsenault	2017-11-29	1	-69/+108
\| \| \| \| \| \| \| \| \|	GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed. llvm-svn: 319270
*	AMDGPU: Allow SIShrinkInstructions to work in non-SSA	Matt Arsenault	2017-07-10	1	-3/+3
\| \| \| \| \| \| \| \|	Immediates can be folded as long as the immediate is a vreg. Also undo commuting instructions if it didn't fold an immediate. llvm-svn: 307575
*	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel	Matt Arsenault	2017-03-21	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
*	[AMDGPU] Remove getBidirectionalReasonRank	Stanislav Mekhanoshin	2017-03-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536
*	AMDGPU/SI: Canonicalize offset order for merged DS instructions	Tom Stellard	2016-08-26	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets. Also, we will want to do this if we move the load/store optimizer before the scheduler. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23776 llvm-svn: 279870
*	AMDGPU: Remove superfluous string attributes from tests	Matt Arsenault	2016-07-11	1	-1/+1
\| \| \| \| \| \|	Also fix v_mac.ll not testing right thing for fneg llvm-svn: 275129
*	MachineScheduler: Fully compare top/bottom candidates	Matthias Braun	2016-06-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	In bidirectional scheduling this gives more stable results than just comparing the "reason" fields of the top/bottom node because the reason field may be higher depending on what other nodes are in the queue. Differential Revision: http://reviews.llvm.org/D19401 llvm-svn: 273755
*	AMDGPU/SI: Enable lanemask tracking in misched	Tom Stellard	2016-03-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876
*	Update test case to appease bots after 263255.	Chad Rosier	2016-03-11	1	-4/+4
\| \| \| \| \| \|	I'll follow up with Matt to confirm this is the correct fix. llvm-svn: 263268
*	AMDGPU: Remove some old intrinsic uses from tests	Matt Arsenault	2016-02-11	1	-21/+15
\| \| \| \|	llvm-svn: 260493
*	AMDGPU: Switch barrier intrinsics to using convergent	Matt Arsenault	2015-12-19	1	-4/+0
\| \| \| \| \| \| \| \|	noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075
*	AMDGPU: Add sdst operand to VOP2b instructions	Matt Arsenault	2015-08-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The VOP3 encoding of these allows any SGPR pair for the i1 output, but this was forced before to always use vcc. This doesn't yet try to use this, but does add the operand to the definitions so the main change is adding vcc to the output of the VOP2 encoding. llvm-svn: 246358
*	AMDGPU/SI: Fix read2 merging into a super register.	Matt Arsenault	2015-07-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the read2 produced was supposed to be writing into a super register, it would use the wrong subregister indices. Fix this by inserting copies, so we only ever write to a vreg_64. Run the register coalescer again to clean this up, although this isn't ideal and often does result in an extra move. Also remove the assert that offset1 > offset0. There isn't a real reason to not allow this other than a minor convenience in the compiler, and it doesn't seem worth the effort of avoiding it. llvm-svn: 242174
*	R600 -> AMDGPU rename	Tom Stellard	2015-06-13	1	-0/+272
	llvm-svn: 239657