bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Sort the remaining #include lines in include/... and lib/....	Chandler Carruth	2017-06-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
*	[AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; ↵	Eugene Zelenko	2017-01-20	1	-14/+19
\| \| \| \| \| \|	other minor fixes (NFC). llvm-svn: 292623
*	[AMDGPU] Add exec copy to LiveIntervals in SILowerControlFlow::emitElse	Stanislav Mekhanoshin	2017-01-19	1	-1/+3
\| \| \| \| \| \| \| \| \|	This instruction is missing from LiveIntervals. I'm not aware of any problems because of this though. Differential Revision: https://reviews.llvm.org/D28879 llvm-svn: 292521
*	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC	Diana Picus	2017-01-13	1	-16/+14
\| \| \| \| \| \| \| \| \| \| \|	Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
*	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition ↵	Stanislav Mekhanoshin	2016-11-28	1	-3/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32 and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp is implemented. Additional side effect of this is that we may consume less VGPRs at a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. Differential Revision: https://reviews.llvm.org/D26114 llvm-svn: 288053
*	[AMDGPU] Fix multiple vreg definitions in si-lower-control-flow	Stanislav Mekhanoshin	2016-11-22	1	-7/+15
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26939 llvm-svn: 287608
*	Use StringRef in Pass/PassManager APIs (NFC)	Mehdi Amini	2016-10-01	1	-1/+1
\| \| \| \|	llvm-svn: 283004
*	AMDGPU: Partially fix control flow at -O0	Matt Arsenault	2016-09-29	1	-13/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. llvm-svn: 282667
*	AMDGPU: Remove register operand from si_mask_branch	Matt Arsenault	2016-08-27	1	-4/+2
\| \| \| \| \| \| \| \| \|	It isn't used for anything, and is also misleading since it could be spilled at the end of the block, so it can't be relied on. There ends up being a verifier error about using an undefined register since the spill kills the register. llvm-svn: 279899
*	AMDGPU: Split SILowerControlFlow into two pieces	Matt Arsenault	2016-08-22	1	-343/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do most of the lowering in a pre-RA pass. Keep the skip jump insertion late, plus a few other things that require more work to move out. One concern I have is now there may be COPY instructions which do not have the necessary implicit exec uses if they will be lowered to v_mov_b32. This has a positive effect on SGPR usage in shader-db. llvm-svn: 279464
*	AMDGPU: Remove unused tracking of flat instructions	Matt Arsenault	2016-08-11	1	-15/+0
\| \| \| \|	llvm-svn: 278361
*	AMDGPU: Change insertion point of si_mask_branch	Matt Arsenault	2016-08-10	1	-10/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273
*	AMDGPU: add execfix flag to SI_ELSE	Nicolai Haehnle	2016-07-28	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972
*	Remove MCAsmInfo.h include from TargetOptions.h	Reid Kleckner	2016-07-27	1	-0/+1
\| \| \| \| \| \| \| \| \|	TargetOptions wants the ExceptionHandling enum. Move that to MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in clang. Now you can add a DWARF tag without a full rebuild of clang semantic analysis. llvm-svn: 276883
*	AMDGPU: Make AMDGPUMachineFunction fields private	Matt Arsenault	2016-07-26	1	-1/+1
\| \| \| \| \| \| \| \| \|	ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766
*	AMDGPU: Make skip threshold an option	Matt Arsenault	2016-07-25	1	-3/+8
\| \| \| \|	llvm-svn: 276680
*	[AMDGPU] Remove spurious line (should've been removed in r276029).	Davide Italiano	2016-07-19	1	-3/+0
\| \| \| \|	llvm-svn: 276030
*	[AMDGPU] Remove dead code.	Davide Italiano	2016-07-19	1	-25/+0
\| \| \| \| \| \|	LGTM'd by Matt Arsenault. llvm-svn: 276029
*	AMDGPU: Expand register indexing pseudos in custom inserter	Matt Arsenault	2016-07-19	1	-286/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
*	AMDGPU: Fix not expanding control flow after some kill blocks	Matt Arsenault	2016-07-15	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip should be inserted at the beginning of the block right after the end cf. Just remove this for now since no tests seem to stress this and I think this can be handled more generally later. Fixes bug 28550 llvm-svn: 275510
*	AMDGPU: Fix trying to skip from a block with no successors	Matt Arsenault	2016-07-15	1	-2/+3
\| \| \| \| \| \|	Found while reducing bug 28550 llvm-svn: 275509
*	AMDGPU: Follow up to r275203	Matt Arsenault	2016-07-12	1	-24/+27
\| \| \| \| \| \|	I meant to squash this into it. llvm-svn: 275220
*	AMDGPU: Fix verifier error with kill intrinsic	Matt Arsenault	2016-07-12	1	-65/+122
\| \| \| \| \| \| \|	Don't create a terminator in the middle of the block. We should probably get rid of this intrinsic. llvm-svn: 275203
*	Revert "AMDGPU: Remove unused control flow intrinsic"	Matt Arsenault	2016-07-09	1	-0/+19
\| \| \| \|	llvm-svn: 274978
*	AMDGPU: Improve offset folding for register indexing	Matt Arsenault	2016-07-09	1	-22/+40
\| \| \| \|	llvm-svn: 274954
*	AMDGPU: Remove unused control flow intrinsic	Matt Arsenault	2016-07-08	1	-19/+0
\| \| \| \|	llvm-svn: 274939
*	AMDGPU: Minor adjustment to r274817	Matt Arsenault	2016-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The commit message is inaccurate, modifiesRegister will check for partial defs of exec. We currently don't ever emit partial defs of exec, so it doesn't really matter. llvm-svn: 274886
*	AMDGPU: Move si_mask_branch register operand to be a use	Matt Arsenault	2016-07-08	1	-4/+6
\| \| \| \|	llvm-svn: 274818
*	AMDGPU: Cleanup. Use definesRegister instead of manual loop	Matt Arsenault	2016-07-08	1	-6/+2
\| \| \| \| \| \| \|	Also this will be more precise since it will check exec_lo/exec_hi writes. llvm-svn: 274817
*	AMDGPU: Fix return of non-void-returning shaders	Nicolai Haehnle	2016-07-06	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that ensures that a non-void-returning shader falls off the end of the last basic block was effectively disabled, since SI_RETURN is now used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21975 llvm-svn: 274612
*	AMDGPU: Add m0 vgpr load loop block as successor	Matt Arsenault	2016-06-30	1	-0/+1
\| \| \| \| \| \| \|	This shows up as a verifier error when I move this earlier, not sure why it didn't before. llvm-svn: 274275
*	AMDGPU: Fix out of bounds indirect indexing errors	Matt Arsenault	2016-06-28	1	-8/+19
\| \| \| \| \| \| \|	This was producing acceses to registers beyond the super register's limits, resulting in verifier failures. llvm-svn: 273977
*	AMDGPU: Fix verifier errors with undef vector indices	Matt Arsenault	2016-06-27	1	-27/+37
\| \| \| \| \| \|	Also fix pointlessly adding exec to liveins. llvm-svn: 273916
*	AMDGPU: Cleanup subtarget handling.	Matt Arsenault	2016-06-24	1	-3/+4
\| \| \| \| \| \| \| \| \|	Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
*	AMDGPU: Fix liveness when expanding m0 loop	Matt Arsenault	2016-06-22	1	-17/+60
\| \| \| \|	llvm-svn: 273514
*	AMDGPU: Fix verifier errors in SILowerControlFlow	Matt Arsenault	2016-06-22	1	-66/+127
\| \| \| \| \| \| \| \| \| \| \| \| \|	The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467
*	AMDGPU: Also look for s_cbranch_vccz	Matt Arsenault	2016-05-19	1	-1/+2
\| \| \| \|	llvm-svn: 270091
*	AMDGPU: Fix crash with unreachable terminators.	Matt Arsenault	2016-04-29	1	-12/+27
\| \| \| \| \| \| \| \| \| \|	If a block has no successors because it ends in unreachable, this was accessing an invalid iterator. Also stop counting instructions that don't emit any real instructions. llvm-svn: 268119
*	AMDGPU: Add a shader calling convention	Nicolai Haehnle	2016-04-06	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \|	This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589
*	AMDGPU: Add SIWholeQuadMode pass	Nicolai Haehnle	2016-03-21	1	-12/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982
*	AMDGPU/SI: Fix threshold calculation for branching when exec is zero	Tom Stellard	2016-03-21	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the masked section when exec is zero if the section contains more than a certain number of instructions. The previous code would only count instructions in successor blocks, and this patch modifies the code to start counting instructions in all blocks between the start and end of the branch. Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18282 llvm-svn: 263969
*	AMDGPU: add missing braces around multi-line if block	Nicolai Haehnle	2016-03-18	1	-1/+2
\| \| \| \| \| \|	This fixes an issue with rL263658 pointed out by Tom Stellard. llvm-svn: 263823
*	AMDGPU: Prevent uniform loops from becoming infinite	Nicolai Haehnle	2016-03-16	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18137 llvm-svn: 263658
*	AMDGPU/SI: Incomplete shader binaries need to finish execution at the end	Marek Olsak	2016-03-14	1	-0/+24
\| \| \| \| \| \| \| \| \| \|	Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 llvm-svn: 263441
*	AMDGPU: Set flat_scratch from flat_scratch_init reg	Matt Arsenault	2016-02-12	1	-35/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658
*	AMDGPU: Initialize SILowerControlFlow	Matt Arsenault	2016-02-12	1	-28/+36
\| \| \| \|	llvm-svn: 260645
*	AMDGPU: Remove trailing whitespace	Matt Arsenault	2016-02-12	1	-4/+4
\| \| \| \|	llvm-svn: 260644
*	AMDGPU: Fix adding redundant m0 uses	Matt Arsenault	2015-10-21	1	-2/+0
\| \| \| \| \| \|	BuildMI already adds these since they are defined correctly now. llvm-svn: 250961
*	AMDGPU: Add MachineInstr overloads for instruction format tests	Matt Arsenault	2015-10-20	1	-2/+2
\| \| \| \|	llvm-svn: 250797
*	AMDGPU: Use explicit register size indirect pseudos	Matt Arsenault	2015-10-07	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This stops using an unknown reg class operand. Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class and an SGPR one if it sees an SGPR use. With the source operand has an explicit VGPR class, illegal copies will be inserted that SIFixSGPRCopies will take care of normally later, which will allow removing the weird check of build_vector users. Without this, when removed v_movrels_b32 would still be emitted even though all of the values were only stored in SGPRs. llvm-svn: 249494