bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU: Add MachineInstr overloads for instruction format tests	Matt Arsenault	2015-10-20	1	-4/+4
\| \| \| \|	llvm-svn: 250797
*	AMDGPU: Fix unused variable warning in release build	Matt Arsenault	2015-10-01	1	-2/+2
\| \| \| \|	llvm-svn: 249091
*	AMDGPU: Make SIInsertWaits about a factor of 4 faster	Matt Arsenault	2015-10-01	1	-23/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was the slowest target custom pass and was spending 80% of the time in getMinimalPhysRegClass which was called for every register operand. Try to use the statically known register class when possible from the instruction's MCOperandInfo. There are a few pseudo instructions which are not well behaved with unknown register classes which still require the expensive physical register class search. There are a few other possibilities for making this even faster, such as not inspecting implicit operands. For now those are checked because it is technically possible to have a scalar load into exec or vcc which can be implicitly used. llvm-svn: 249079
*	AMDGPU: Fix recomputing dominator tree unnecessarily	Matt Arsenault	2015-09-25	1	-1/+5
\| \| \| \| \| \| \|	SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands. llvm-svn: 248587
*	AMDGPU: Add s_dcache_* instructions	Matt Arsenault	2015-09-24	1	-6/+14
\| \| \| \|	llvm-svn: 248533
*	AMDGPU/SI: Better handle s_wait insertion	Tom Stellard	2015-08-21	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755
*	Fix some comment typos.	Benjamin Kramer	2015-08-08	1	-1/+1
\| \| \| \|	llvm-svn: 244402
*	R600 -> AMDGPU rename	Tom Stellard	2015-06-13	1	-0/+480
	llvm-svn: 239657