bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU: Move cndmask pseudo to be isel pseudo	Matt Arsenault	2016-08-27	1	-23/+0
\| \| \| \| \| \| \| \|	There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901
*	Replace "fallthrough" comments with LLVM_FALLTHROUGH	Justin Bogner	2016-08-17	1	-1/+1
\| \| \| \| \| \| \|	This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902
*	AMDGPU: Fix not estimating MBB operand sizes correctly	Matt Arsenault	2016-08-13	1	-2/+20
\| \| \| \|	llvm-svn: 278590
*	AMDGPU: Remove unnecessary cast	Matt Arsenault	2016-08-10	1	-4/+2
\| \| \| \|	llvm-svn: 278274
*	MachineFunction: Return reference for getFrameInfo(); NFC	Matthias Braun	2016-07-28	1	-6/+6
\| \| \| \| \| \| \|	getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
*	AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling	Tom Stellard	2016-07-28	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980
*	AMDGPU: Make AMDGPUMachineFunction fields private	Matt Arsenault	2016-07-26	1	-1/+1
\| \| \| \| \| \| \| \| \|	ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766
*	AMDGPU: Expand register indexing pseudos in custom inserter	Matt Arsenault	2016-07-19	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
*	AMDGPU: Fix verifier error from partially undef copy	Matt Arsenault	2016-07-15	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In this situation: %VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11, %VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use> %VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3 %VGPR4<def> = COPY %VGPR2 The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1, but VGPR4 is defined immediately after this copy. llvm-svn: 275635
*	Rename AnalyzeBranch* to analyzeBranch*.	Jacques Pienaar	2016-07-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564
*	AMDGPU: Cleanup pseudoinstructions	Matt Arsenault	2016-07-12	1	-5/+0
\| \| \| \|	llvm-svn: 275133
*	AMDGPU: Move R600 only pieces into R600 classes	Matt Arsenault	2016-07-09	1	-8/+0
\| \| \| \|	llvm-svn: 274979
*	AMDGPU: Improve offset folding for register indexing	Matt Arsenault	2016-07-09	1	-1/+2
\| \| \| \|	llvm-svn: 274954
*	AMDGPU: Simplify isSchedulingBoundary	Matt Arsenault	2016-07-09	1	-5/+4
\| \| \| \|	llvm-svn: 274953
*	AMDGPU: Remove implicit iterator conversions, NFC	Duncan P. N. Exon Smith	2016-07-08	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	Remove remaining implicit conversions from MachineInstrBundleIterator to MachineInstr* from the AMDGPU backend. In most cases, I made them less attractive by preferring MachineInstr& or using a ranged-based for loop. Once all the backends are fixed I'll make the operator explicit so that this doesn't bitrot back. llvm-svn: 274906
*	AMDGPU: Fix folding SGPRs into madak/madmk src0	Matt Arsenault	2016-07-05	1	-3/+11
\| \| \| \| \| \| \| \| \| \|	Because of the special immediate operand, the constant bus is already used so SGPRs are never useful. r263212 changed the name of the immediate operand, which broke the verifier check for the restriction. llvm-svn: 274564
*	CodeGen: Use MachineInstr& in TargetInstrInfo, NFC	Duncan P. N. Exon Smith	2016-06-30	1	-430/+420
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
*	AMDGPU: Remove unused function	Matt Arsenault	2016-06-28	1	-27/+0
\| \| \| \|	llvm-svn: 274033
*	AMDGPU: Cleanup subtarget handling.	Matt Arsenault	2016-06-24	1	-18/+17
\| \| \| \| \| \| \| \| \|	Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
*	AMDGPU: readlane/writelane do not read exec	Matt Arsenault	2016-06-23	1	-1/+24
\| \| \| \|	llvm-svn: 273525
*	Reformat blank lines.	NAKAMURA Takumi	2016-06-20	1	-7/+0
\| \| \| \|	llvm-svn: 273131
*	Untabify.	NAKAMURA Takumi	2016-06-20	1	-2/+2
\| \| \| \|	llvm-svn: 273129
*	AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and ↵	Changpeng Fang	2016-06-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	eliminateFrameIndex Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/21438 llvm-svn: 272958
*	AMDGPU/SI: Refactor fixup handling for constant addrspace variables	Tom Stellard	2016-06-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Re-commit this after fixing a bug where we were trying to use a reference to a Triple object that had already been destroyed. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272705
*	Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"	Tom Stellard	2016-06-14	1	-1/+1
\| \| \| \| \| \|	This reverts commit r272675. llvm-svn: 272677
*	AMDGPU/SI: Refactor fixup handling for constant addrspace variables	Tom Stellard	2016-06-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272675
*	AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing	Marek Olsak	2016-06-13	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Mesa and other users must set this to enable coalescing: - STRIDE = 0 - SWIZZLE_ENABLE = 1 This makes one particular compute shader 8x faster. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21136 llvm-svn: 272556
*	AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAlloc	Matt Arsenault	2016-06-13	1	-14/+16
\| \| \| \| \| \| \|	The condition reg of the cndmask_b64 expansion can't be killed by the first one, and the implicit super register implicit def is needed. llvm-svn: 272554
*	Pass DebugLoc and SDLoc by const ref.	Benjamin Kramer	2016-06-12	1	-6/+5
\| \| \| \| \| \| \| \|	This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
*	AMDGPU: Add function for getting instruction size	Matt Arsenault	2016-06-06	1	-0/+49
\| \| \| \|	llvm-svn: 271936
*	AMDGPU: Handle flat in getMemOpBaseRegImmOfs	Matt Arsenault	2016-06-02	1	-0/+7
\| \| \| \| \| \| \|	It can still report the base register, and the uses give up when it fails. llvm-svn: 271575
*	AMDGPU: Fix incorrectly setting kill flag when copying register tuples	Matt Arsenault	2016-06-02	1	-1/+1
\| \| \| \| \| \| \|	This fixes some verifier errors when trackLivenessAfterRegAlloc is enabled. llvm-svn: 271446
*	AMDGPU: Fix verifier error when spilling SGPRs	Matt Arsenault	2016-05-21	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \|	The current SGPR spilling test does not stress this because it is using s_buffer_load instructions to increase SGPR pressure and spill, but their output operands have the same SReg_32_XM0 constraint. This fixes an error when the SReg_32 output from most instructions is spilled. llvm-svn: 270301
*	AMDGPU: Handle cbranch vccz/vccnz	Matt Arsenault	2016-05-21	1	-0/+16
\| \| \| \|	llvm-svn: 270297
*	AMDGPU: Implement ReverseBranchCondition	Matt Arsenault	2016-05-21	1	-0/+7
\| \| \| \|	llvm-svn: 270296
*	AMDGPU: Implement AnalyzeBranch	Matt Arsenault	2016-05-21	1	-0/+109
\| \| \| \| \| \|	Original patch by Tom Stellard llvm-svn: 270295
*	AMDGPU: Remove verifier check for scc live ins	Matt Arsenault	2016-05-13	1	-10/+0
\| \| \| \| \| \| \| \| \| \|	We only really need this to be true for SIFixSGPRCopies. I'm not sure there's any way this could happen before that point. Fixes a case where MachineCSE could introduce a cross block scc use. llvm-svn: 269391
*	AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260	Tom Stellard	2016-05-02	1	-1/+2
\| \| \| \| \| \| \|	We can't use MI->getDebugLoc() when MI is an iterator that could be MBB.end(). llvm-svn: 268265
*	AMDGPU/SI: Enable the post-ra scheduler	Tom Stellard	2016-04-30	1	-2/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
*	AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions	Tom Stellard	2016-04-29	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043
*	Fix incorrect redundant expression in target AMDGPU.	Etienne Bergeron	2016-04-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The expression is detected as a redundant expression. Turn out, this is probably a bug. ``` /home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression] if (isSMRD(FirstLdSt) && isSMRD(FirstLdSt)) { ``` Reviewers: rnk, tstellarAMD Subscribers: arsenm, cfe-commits Differential Revision: http://reviews.llvm.org/D19460 llvm-svn: 267415
*	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic	Nicolai Haehnle	2016-04-22	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102
*	AMDGPU: Guard VOPC instructions against incorrect commute	Nicolai Haehnle	2016-04-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825
*	[MachineScheduler]Add support for store clustering	Jun Bum Lim	2016-04-15	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437
*	AMDGPU: Run SIFoldOperands after PeepholeOptimizer	Matt Arsenault	2016-04-14	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378
*	AMDGPU/SI: Fix spilling of 96-bit registers	Tom Stellard	2016-04-12	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It seems like this was broken in r252327. I thought we had test cases for this, but it's really hard to tirgger spills of this exact register size since they aren't used very much. Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19021 llvm-svn: 266152
*	AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates	Tom Stellard	2016-04-07	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678
*	AMDGPU: Add a shader calling convention	Nicolai Haehnle	2016-04-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589
*	RegisterScavenger: Take a reference as enterBasicBlock() argument.	Matthias Braun	2016-04-06	1	-1/+1
\| \| \| \| \| \| \|	Make it obvious that the argument cannot be nullptr. Remove an unnecessary nullptr check in initRegState. llvm-svn: 265511
*	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions	Tom Stellard	2016-03-28	1	-8/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589