bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU/SI: Implement a custom MachineSchedStrategy	Tom Stellard	2016-08-29	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
*	XXX	Tom Stellard	2016-08-26	1	-1/+1
\| \| \| \|	llvm-svn: 279868
*	AMDGPU/SI: Use a better method for determining the largest pressure sets	Tom Stellard	2016-08-26	1	-9/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are a few different sgpr pressure sets, but we only care about the one which covers all of the sgprs. We were using hard-coded register pressure set names to determine the reg set id for the biggest sgpr set. However, we were using the wrong name, and this method is pretty fragile, since the reg pressure set names may change. The new method just looks for the pressure set that contains the most reg units and sets that set as our SGPR pressure set. We've also adopted the same technique for determining our VGPR pressure set. Reviewers: arsenm Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23687 llvm-svn: 279867
*	AMDGPU: Remove custom getSubReg	Matt Arsenault	2016-08-11	1	-72/+10
\| \| \| \| \| \| \|	This was kind of confusing, the subregister class shouldn't really be necessary. llvm-svn: 278362
*	MachineFunction: Return reference for getFrameInfo(); NFC	Matthias Braun	2016-07-28	1	-10/+10
\| \| \| \| \| \| \|	getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
*	AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling	Tom Stellard	2016-07-28	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980
*	AMDGPU: Add HSA dispatch id intrinsic	Matt Arsenault	2016-07-22	1	-1/+2
\| \| \| \|	llvm-svn: 276437
*	AMDGPU/SI: Emit the number of SGPR and VGPR spills	Marek Olsak	2016-07-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: v2: don't count SGPRs spilled to scratch twice I think this is sufficient. It doesn't count private memory usage, which happens often and uses scratch but isn't technically a spill. The private memory usage can be computed by: [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills]. The fact SGPR spills add very high numbers to the scratch size make that computation a guessing game, but I don't have a solution to that. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D22197 llvm-svn: 275288
*	AMDGPU: Enable trackLivenessAfterRegAlloc	Matt Arsenault	2016-07-11	1	-0/+5
\| \| \| \| \| \|	This has caught a number of bugs. llvm-svn: 275131
*	AMDGPU: fix local stack slot allocation bugs	Nicolai Haehnle	2016-07-11	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The main bug fix here is using the 32-bit encoding of V_ADD_I32 in materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary immediates work. The second part is that we may now require the SegmentWaveByteOffset even when there are initially no stack objects and VGPR spilling isn't enabled, for stack slots that are allocated later. This means that some bits become effectively dead and can be cleaned up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602 Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21551 llvm-svn: 275108
*	CodeGen: Use MachineInstr& in TargetInstrInfo, NFC	Duncan P. N. Exon Smith	2016-06-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
*	AMDGPU: Cleanup subtarget handling.	Matt Arsenault	2016-06-24	1	-26/+22
\| \| \| \| \| \| \| \| \|	Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
*	AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and ↵	Changpeng Fang	2016-06-16	1	-12/+26
\| \| \| \| \| \| \| \| \| \|	eliminateFrameIndex Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/21438 llvm-svn: 272958
*	AMDGPU: Remove incorrect assertion	Matt Arsenault	2016-06-09	1	-4/+0
\| \| \| \| \| \| \|	I'm still not sure under what circumstances the offset here is non-0, but private memory is not limited to 27-bits. llvm-svn: 272337
*	[AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs	Konstantin Zhuravlyov	2016-05-24	1	-3/+3
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D20081 llvm-svn: 270594
*	AMDGPU: Fix verifier error when spilling undef subreg	Matt Arsenault	2016-05-18	1	-3/+11
\| \| \| \|	llvm-svn: 270002
*	AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratch	Tom Stellard	2016-05-02	1	-2/+1
\| \| \| \| \| \| \| \| \|	We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295
*	AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratch	Tom Stellard	2016-05-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When we restore an SGPR value from scratch, we first load it into a temporary VGPR and then use v_readlane_b32 to copy the value from the VGPR back into an SGPR. We weren't setting the kill flag on the VGPR in the v_readlane_b32 instruction, so the register scavenger wasn't able to re-use this temp value later. I wasn't able to create a lit test for this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19744 llvm-svn: 268287
*	AMDGPU/SI: Enable the post-ra scheduler	Tom Stellard	2016-04-30	1	-16/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
*	[AMDGPU] Move reserved vgpr count for trap handler usage to ↵	Konstantin Zhuravlyov	2016-04-26	1	-2/+3
\| \| \| \| \| \| \| \|	SIMachineFunctionInfo + minor commenting changes Differential Revision: http://reviews.llvm.org/D19537 llvm-svn: 267573
*	[AMDGPU] Reserve VGPRs for trap handler usage if instructed	Konstantin Zhuravlyov	2016-04-26	1	-0/+11
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563
*	AMDGPU: Add queue ptr intrinsic	Matt Arsenault	2016-04-25	1	-1/+2
\| \| \| \|	llvm-svn: 267451
*	Silence some "initialized but unused" warnings from MSVC -- the function ↵	Aaron Ballman	2016-04-18	1	-13/+2
\| \| \| \| \| \|	being called is a static function, so there's no need for an instance variable. NFC. llvm-svn: 266616
*	AMDGPU: Enable LocalStackSlotAllocation pass	Matt Arsenault	2016-04-16	1	-0/+138
\| \| \| \| \| \| \| \| \| \| \|	This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508
*	AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit	Tom Stellard	2016-04-14	1	-52/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337
*	AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registers	Tom Stellard	2016-04-13	1	-10/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When we are spilling SGPRs to scratch memory, we usually don't have free SGPRs to do the address calculation, so we need to re-use the ScratchOffset register for the calculation. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18917 llvm-svn: 266244
*	[AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and ↵	Artem Tamazov	2016-04-13	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205
*	AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates	Tom Stellard	2016-04-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678
*	AMDGPU: Cache information about register pressure sets	Tom Stellard	2016-03-23	1	-24/+33
\| \| \| \| \| \| \| \| \| \|	We can statically decide whether or not a register pressure set is for SGPRs or VGPRs, so we don't need to re-compute this information in SIRegisterInfo::getRegPressureSetLimit(). Differential Revision: http://reviews.llvm.org/D14805 llvm-svn: 264126
*	AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics	Nicolai Haehnle	2016-03-10	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 llvm-svn: 263140
*	AMDGPU/SI: Add support for spiling SGPRs to scratch buffer	Tom Stellard	2016-03-04	1	-17/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is necessary for when we run out of VGPRs and can no longer use v_{read,write}_lane for spilling SGPRs. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17592 llvm-svn: 262732
*	AMDGPU/SI: Enable frame index scavenging during PrologEpilogueInserter	Tom Stellard	2016-03-04	1	-7/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows us to use virtual registers when we need extra registers for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex(). Once all the frame indices have been eliminated, the PrologEpilogueInserter does an extra pass over the program to replace all virtual registers with physical ones. This allows us to make more efficient use of our emergency spill slots, so we only need to create one. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17591 llvm-svn: 262728
*	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions	Tom Stellard	2016-02-12	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
*	AMDGPU: Set flat_scratch from flat_scratch_init reg	Matt Arsenault	2016-02-12	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658
*	AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs	Tom Stellard	2016-02-11	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599
*	AMDGPU: Release the scavenged offset register during VGPR spill	Nicolai Haehnle	2016-02-10	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes a crash where subsequent spills would be unable to scavenge a register. In particular, it fixes a crash in piglit's spec@glsl-1.50@execution@geometry@max-input-components (the test still has a shader that fails to compile because of too many SGPR spills, but at least it doesn't crash any more). This is a candidate for the release branch. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, arsenm Differential Revision: http://reviews.llvm.org/D16558 llvm-svn: 260427
*	AMDGPU/SI: Add SI Machine Scheduler	Nicolai Haehnle	2016-01-13	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609
*	AMDGPU/SI: Fold operands with sub-registers	Nicolai Haehnle	2016-01-07	1	-4/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074
*	AMDGPU/SI: xnack_mask is always reserved on VI	Nicolai Haehnle	2016-01-07	1	-31/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and went back to actually test what is happening, and it turns out that xnack_mask is always reserved at least on Tonga and Carrizo, in the sense that flat_scr is always fixed below the SGPRs that are used to implement xnack_mask, whether or not they are actually used. I confirmed this by writing a shader using inline assembly to tease out the aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so xnack_mask is s[76:77] and vcc is s[78:79]). This patch changes both the calculation of the total number of SGPRs and the various register reservations to account for this. It ought to be possible to use the gap left by xnack_mask when the feature isn't used, but this patch doesn't try to do that. (Note that the same applies to vcc.) Note that previously, even before my earlier change in r256794, the SGPRs that alias to xnack_mask could end up being used as well when flat_scr was unused and the total number of SGPRs happened to fall on the right alignment (e.g. highest regular SGPR being used s29 and VCC used would lead to number of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there were some conflict due to such aliasing, we should have noticed that already. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15898 llvm-svn: 257073
*	AMDGPU: add +xnack feature	Nicolai Haehnle	2016-01-04	1	-6/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically. The hardware only requires this reservation when the XNACK feature is explicitly enabled. At some point, HSA will probably want to do that, but it does increase SGPR register pressure, so leave it disabled by default for now (but do add a small test). Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15869 llvm-svn: 256794
*	AMDGPU: Avoid assertions after SGPR spilling failed	Nicolai Haehnle	2016-01-04	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The comment explains it: emitError does not necessarily exit the compilation process, and then using NoRegister leads to assertions later on. This generates incorrect code, of course, but the user should know to not use the result when an error has been emitted. It would be nice to have a test-case for this inside the LLVM repository, but llc exits on error. shader-db tests trigger the underlying issue at least on Tonga. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15826 llvm-svn: 256757
*	AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex	Nicolai Haehnle	2015-12-17	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The method insertNOPs expected the number of wait states to be passed as parameter, while eliminateFrameIndex passed the immediate argument for the S_NOP, leading to an off-by-one error. Rename the method to make the meaning of its parameter clearer. The number of 4 / 5 wait states (which is what the method has always _tried_ to do according to the comment) is correct according to the hardware docs. I stumbled upon this while trying to track down the cause of https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed, this patch unfortunately does not fix that bug... Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15542 llvm-svn: 255906
*	Squelch unused variable warning in SIRegisterInfo.cpp.	Matt Arsenault	2015-12-01	1	-1/+2
\| \| \| \| \| \|	Patch by Justin Lebar llvm-svn: 254362
*	AMDGPU: Rework how private buffer passed for HSA	Matt Arsenault	2015-11-30	1	-18/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331
*	AMDGPU: Rename enums to be consistent with HSA code object terminology	Matt Arsenault	2015-11-30	1	-14/+14
\| \| \| \|	llvm-svn: 254330
*	AMDGPU: Remove SIPrepareScratchRegs	Matt Arsenault	2015-11-30	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. llvm-svn: 254329
*	AMDGPU: Add llvm.amdgcn.dispatch.ptr intrinsic	Tom Stellard	2015-11-26	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This returns a pointer to the dispatch packet, which can be used to load information about the kernel dispach. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D14898 llvm-svn: 254116
*	Revert "Remove unnecessary call to getAllocatableRegClass"	Tom Stellard	2015-11-12	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r252565. This also includes the revert of the commit mentioned below in order to avoid breaking tests in AMDGPU: Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64" This reverts commit r252674. llvm-svn: 252956
*	AMDGPU: Set isAllocatable = 0 on VS_32/VS_64	Matt Arsenault	2015-11-11	1	-7/+1
\| \| \| \|	llvm-svn: 252674
*	AMDGPU: Hack for VS_32 register pressure	Matt Arsenault	2015-11-06	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321