bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions	Tom Stellard	2016-02-12	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
*	AMDGPU: Set flat_scratch from flat_scratch_init reg	Matt Arsenault	2016-02-12	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658
*	AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs	Tom Stellard	2016-02-11	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599
*	AMDGPU: Release the scavenged offset register during VGPR spill	Nicolai Haehnle	2016-02-10	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes a crash where subsequent spills would be unable to scavenge a register. In particular, it fixes a crash in piglit's spec@glsl-1.50@execution@geometry@max-input-components (the test still has a shader that fails to compile because of too many SGPR spills, but at least it doesn't crash any more). This is a candidate for the release branch. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, arsenm Differential Revision: http://reviews.llvm.org/D16558 llvm-svn: 260427
*	AMDGPU/SI: Add SI Machine Scheduler	Nicolai Haehnle	2016-01-13	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609
*	AMDGPU/SI: Fold operands with sub-registers	Nicolai Haehnle	2016-01-07	1	-4/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074
*	AMDGPU/SI: xnack_mask is always reserved on VI	Nicolai Haehnle	2016-01-07	1	-31/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and went back to actually test what is happening, and it turns out that xnack_mask is always reserved at least on Tonga and Carrizo, in the sense that flat_scr is always fixed below the SGPRs that are used to implement xnack_mask, whether or not they are actually used. I confirmed this by writing a shader using inline assembly to tease out the aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so xnack_mask is s[76:77] and vcc is s[78:79]). This patch changes both the calculation of the total number of SGPRs and the various register reservations to account for this. It ought to be possible to use the gap left by xnack_mask when the feature isn't used, but this patch doesn't try to do that. (Note that the same applies to vcc.) Note that previously, even before my earlier change in r256794, the SGPRs that alias to xnack_mask could end up being used as well when flat_scr was unused and the total number of SGPRs happened to fall on the right alignment (e.g. highest regular SGPR being used s29 and VCC used would lead to number of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there were some conflict due to such aliasing, we should have noticed that already. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15898 llvm-svn: 257073
*	AMDGPU: add +xnack feature	Nicolai Haehnle	2016-01-04	1	-6/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically. The hardware only requires this reservation when the XNACK feature is explicitly enabled. At some point, HSA will probably want to do that, but it does increase SGPR register pressure, so leave it disabled by default for now (but do add a small test). Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15869 llvm-svn: 256794
*	AMDGPU: Avoid assertions after SGPR spilling failed	Nicolai Haehnle	2016-01-04	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The comment explains it: emitError does not necessarily exit the compilation process, and then using NoRegister leads to assertions later on. This generates incorrect code, of course, but the user should know to not use the result when an error has been emitted. It would be nice to have a test-case for this inside the LLVM repository, but llc exits on error. shader-db tests trigger the underlying issue at least on Tonga. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15826 llvm-svn: 256757
*	AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex	Nicolai Haehnle	2015-12-17	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The method insertNOPs expected the number of wait states to be passed as parameter, while eliminateFrameIndex passed the immediate argument for the S_NOP, leading to an off-by-one error. Rename the method to make the meaning of its parameter clearer. The number of 4 / 5 wait states (which is what the method has always _tried_ to do according to the comment) is correct according to the hardware docs. I stumbled upon this while trying to track down the cause of https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed, this patch unfortunately does not fix that bug... Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15542 llvm-svn: 255906
*	Squelch unused variable warning in SIRegisterInfo.cpp.	Matt Arsenault	2015-12-01	1	-1/+2
\| \| \| \| \| \|	Patch by Justin Lebar llvm-svn: 254362
*	AMDGPU: Rework how private buffer passed for HSA	Matt Arsenault	2015-11-30	1	-18/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331
*	AMDGPU: Rename enums to be consistent with HSA code object terminology	Matt Arsenault	2015-11-30	1	-14/+14
\| \| \| \|	llvm-svn: 254330
*	AMDGPU: Remove SIPrepareScratchRegs	Matt Arsenault	2015-11-30	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. llvm-svn: 254329
*	AMDGPU: Add llvm.amdgcn.dispatch.ptr intrinsic	Tom Stellard	2015-11-26	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This returns a pointer to the dispatch packet, which can be used to load information about the kernel dispach. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D14898 llvm-svn: 254116
*	Revert "Remove unnecessary call to getAllocatableRegClass"	Tom Stellard	2015-11-12	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r252565. This also includes the revert of the commit mentioned below in order to avoid breaking tests in AMDGPU: Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64" This reverts commit r252674. llvm-svn: 252956
*	AMDGPU: Set isAllocatable = 0 on VS_32/VS_64	Matt Arsenault	2015-11-11	1	-7/+1
\| \| \| \|	llvm-svn: 252674
*	AMDGPU: Hack for VS_32 register pressure	Matt Arsenault	2015-11-06	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321
*	AMDGPU: s[102:103] is unavailable on VI	Matt Arsenault	2015-11-03	1	-1/+10
\| \| \| \|	llvm-svn: 252000
*	AMDGPU: Define correct number of SGPRs	Matt Arsenault	2015-11-03	1	-0/+4
\| \| \| \| \| \| \| \| \|	There are actually 104 so 2 were missing. More assembler tests with high register number tuples will be included in later patches. llvm-svn: 251999
*	AMDGPU: Stop reserving v[254:255]	Matt Arsenault	2015-10-20	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \|	This wasn't doing anything useful. They weren't explicitly used anywhere, and the RegScavenger ignores reserved registers. This for some reason caused a random scheduling change in the test. Getting the check lines to pass is too frustrating, and there's probably not too much value in checking the vector case's operands N times. llvm-svn: 250794
*	Make a bunch of static arrays const.	Craig Topper	2015-10-18	1	-1/+1
\| \| \| \|	llvm-svn: 250642
*	AMDGPU: Make SIInsertWaits about a factor of 4 faster	Matt Arsenault	2015-10-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was the slowest target custom pass and was spending 80% of the time in getMinimalPhysRegClass which was called for every register operand. Try to use the statically known register class when possible from the instruction's MCOperandInfo. There are a few pseudo instructions which are not well behaved with unknown register classes which still require the expensive physical register class search. There are a few other possibilities for making this even faster, such as not inspecting implicit operands. For now those are checked because it is technically possible to have a scalar load into exec or vcc which can be implicitly used. llvm-svn: 249079
*	AMDGPU: Switch over reg class size instead of checking all super classes	Matt Arsenault	2015-09-26	1	-20/+34
\| \| \| \| \| \|	This gets isSGPRClass out of my profile of SIFixSGPRCopies. llvm-svn: 248656
*	Introduce target hook for optimizing register copies	Matt Arsenault	2015-09-24	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478
*	Untabify.	NAKAMURA Takumi	2015-09-22	1	-1/+1
\| \| \| \|	llvm-svn: 248264
*	Reformat blank lines.	NAKAMURA Takumi	2015-09-22	1	-1/+0
\| \| \| \|	llvm-svn: 248263
*	AMDGPU: Remove dead code	Matt Arsenault	2015-09-19	1	-8/+0
\| \| \| \| \| \| \|	getCFGStructurizerRegClass is not used for SI, so move it into R600 specific stuff. llvm-svn: 248087
*	AMDGPU: Set mem operands for spill instructions	Matt Arsenault	2015-08-29	1	-8/+9
\| \| \| \|	llvm-svn: 246357
*	AMDGPU: Make sure to reserve super registers	Matt Arsenault	2015-08-26	1	-16/+15
\| \| \| \| \| \| \| \|	I think this could potentially have broken if one of the super registers were allocated that contain v254/v255. llvm-svn: 246051
*	MachineRegisterInfo: Introduce isPhysRegUsed()	Matthias Braun	2015-08-18	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This method checks whether a physical regiser or any of its aliases are used in the function. Using this function in SIRegisterInfo::findUnusedReg() should also fix this reported failure: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html http://reviews.llvm.org/rL242173#inline-533 The report doesn't come with a testcase and I don't know enough about AMDGPU to create one myself. llvm-svn: 245329
*	AMDGPU/SI: Add missing spill class	Tom Stellard	2015-08-14	1	-1/+2
\| \| \| \| \| \| \| \|	The compiler was failing to spill for some shaders. Patch By: Axel Davy llvm-svn: 245087
*	AMDGPU: Remove SCCReg.	Matt Arsenault	2015-08-05	1	-2/+0
\| \| \| \| \| \| \|	These should be handled as a physical register rather than a virtual register class with one member. llvm-svn: 244061
*	MachineRegisterInfo: Remove UsedPhysReg infrastructure	Matthias Braun	2015-07-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	We have a detailed def/use lists for every physical register in MachineRegisterInfo anyway, so there is little use in maintaining an additional bitset of which ones are used. Removing it frees us from extra book keeping. This simplifies VirtRegMap. Differential Revision: http://reviews.llvm.org/D10911 llvm-svn: 242173
*	R600 -> AMDGPU rename	Tom Stellard	2015-06-13	1	-0/+543
\| \| \| \|	llvm-svn: 239657
*	Revert "AMDGPU: Add core backend files for R600/SI codegen v6"	Tom Stellard	2012-07-16	1	-51/+0
\| \| \| \| \| \|	This reverts commit 4ea70107c5e51230e9e60f0bf58a0f74aa4885ea. llvm-svn: 160303
*	AMDGPU: Add core backend files for R600/SI codegen v6	Tom Stellard	2012-07-16	1	-0/+51
	llvm-svn: 160270