bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "[AMDGPU] Invert the handling of skip insertion."	Nicolai Hähnle	2020-02-03	1	-17/+25
\| \| \| \| \| \| \| \| \|	This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa. (cherry picked from commit a80291ce10ba9667352adcc895f9668144f5f616)
*	[AMDGPU] Invert the handling of skip insertion.	cdevadas	2020-01-15	1	-25/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092
*	[AMDGPU] Fix emitIfBreak CF lowering: use temp reg to make register ↵	vpykhtin	2019-11-26	1	-3/+2
\| \| \| \| \| \|	coalescer life easier. Differential revision: https://reviews.llvm.org/D70405
*	[AMDGPU] Come back patch for the 'Assign register class for cross block ↵	Alexander Timofeev	2019-10-14	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 llvm-svn: 374767
*	[MBP] Move a latch block with conditional exit and multi predecessors to top ↵	Guozhi Wei	2019-06-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471
*	[AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd	Alexander Timofeev	2019-06-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	"Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749
*	[AMDGPU] Divergence driven ISel. Assign register class for cross block ↵	Alexander Timofeev	2019-05-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741
*	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for ↵	Peter Collingbourne	2019-05-25	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688
*	[AMDGPU] Divergence driven ISel. Assign register class for cross block ↵	Alexander Timofeev	2019-05-24	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644
*	AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions	Rhys Perry	2019-04-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches. Reviewers: arsen, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60824 llvm-svn: 358592
*	[LowerSwitch][AMDGPU] Do not handle impossible values	Roman Tereshin	2019-02-22	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	This patch adds LazyValueInfo to LowerSwitch to compute the range of the value being switched over and reduce the size of the tree LowerSwitch builds to lower a switch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D58096 llvm-svn: 354670
*	AMDGPU: Rewrite SILowerI1Copies to always stay on SALU	Nicolai Haehnle	2018-10-31	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719
*	AMDGPU: Remove PHI loop condition optimization	Nicolai Haehnle	2018-10-31	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718
*	[AMDGPU] Preliminary patch for divergence driven instruction selection. ↵	Alexander Timofeev	2018-09-11	1	-0/+1
\| \| \| \| \| \| \| \| \|	Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928
*	[AMDGPU] Avoid using divergent value in mubuf addr64 descriptor	Tim Renouf	2018-08-02	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes a problem where a load from global+idx generated incorrect code on <=gfx7 when the index is divergent. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47383 Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed llvm-svn: 338779
*	AMDGPU: Don't use struct type for argument layout	Matt Arsenault	2018-06-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect of avoiding creating an extra GEP for the offset from the base kernel argument to the explicit kernel argument offset. llvm-svn: 335999
*	AMDGPU: Add pass to lower kernel arguments to loads	Matt Arsenault	2018-06-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650
*	[AMDGPU] Fixed incorrect break from loop	Tim Renouf	2018-05-25	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Lower control flow did not correctly handle the case that a loop break in if/else was on a condition that was not guaranteed to be masked by exec. The first test kernel shows an example of this going wrong; after exiting the loop, exec is all ones, even if it was not before the loop. The fix is for lowering of if-break and else-break to insert an S_AND_B64 to mask the break condition with exec. This commit also includes the optimization of not inserting that S_AND_B64 if it is obviously not needed because the break condition is the result of a V_CMP in the same basic block. V2: Addressed some review comments. V3: Test fixes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44046 Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c llvm-svn: 333258
*	[AMDGPU] Revert b0efc4fd6 (https://reviews.llvm.org/D40556)	Alexander Timofeev	2018-04-25	1	-2/+2
\| \| \| \|	llvm-svn: 330818
*	[CodeGen] Unify MBB reference format in both MIR and debug output	Francis Visoiu Mistrih	2017-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(\1)/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
*	[AMDGPU] SiFixSGPRCopies should not modify non-divergent PHI	Alexander Timofeev	2017-12-01	1	-2/+2
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D40556 llvm-svn: 319534
*	[AMDGPU] Eliminate no effect instructions before s_endpgm	Stanislav Mekhanoshin	2017-08-16	1	-6/+0
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D36585 llvm-svn: 310987
*	[AMDGPU] Optimize SI_IF lowering for simple if regions	Stanislav Mekhanoshin	2017-07-26	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently SI_IF results in a s_and_saveexec_b64 followed by s_xor_b64. The xor is used to extract only the changed bits. In case of a simple if region where the only use of that value is in the SI_END_CF to restore the old exec mask, we can omit the xor and perform an or of the exec mask with the original exec value saved by the s_and_saveexec_b64. Differential Revision: https://reviews.llvm.org/D35861 llvm-svn: 309185
*	[AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.	Mark Searles	2017-06-02	1	-6/+2
\| \| \| \| \| \| \| \| \|	-enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass Differential Revision: https://reviews.llvm.org/D33730 llvm-svn: 304551
*	AMDGPU : Fix common dominator of two incoming blocks terminates with uniform ↵	Wei Ding	2017-04-12	1	-2/+2
\| \| \| \| \| \| \| \|	branch issue. Differential Revision: http://reviews.llvm.org/D31350 llvm-svn: 300142
*	AMDGPU: Unify divergent function exits.	Matt Arsenault	2017-03-24	1	-6/+77
\| \| \| \| \| \| \| \| \| \|	StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729
*	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel	Matt Arsenault	2017-03-21	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
*	[AMDGPU] Remove getBidirectionalReasonRank	Stanislav Mekhanoshin	2017-03-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536
*	AMDGPU/SI: Avoid moving PHIs to VALU when phi values are defined in scalar ↵	Tom Stellard	2016-11-29	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \|	branches Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23417 llvm-svn: 288095
*	[AMDGPU] Fix multiple vreg definitions in si-lower-control-flow	Stanislav Mekhanoshin	2016-11-22	1	-7/+7
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26939 llvm-svn: 287608
*	AMDGPU: Use unsigned compare for eq/ne	Matt Arsenault	2016-09-30	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832
*	AMDGPU: Split SILowerControlFlow into two pieces	Matt Arsenault	2016-08-22	1	-9/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do most of the lowering in a pre-RA pass. Keep the skip jump insertion late, plus a few other things that require more work to move out. One concern I have is now there may be COPY instructions which do not have the necessary implicit exec uses if they will be lowered to v_mov_b32. This has a positive effect on SGPR usage in shader-db. llvm-svn: 279464
*	AMDGPU: Change insertion point of si_mask_branch	Matt Arsenault	2016-08-10	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273
*	AMDGPU: Handle cbranch vccz/vccnz	Matt Arsenault	2016-05-21	1	-2/+1
\| \| \| \|	llvm-svn: 270297
*	AMDGPU: Implement AnalyzeBranch	Matt Arsenault	2016-05-21	1	-14/+13
\| \| \| \| \| \|	Original patch by Tom Stellard llvm-svn: 270295
*	RegisterPressure: Fix default lanemask for missing regunit intervals	Matthias Braun	2016-04-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of missing live intervals for a physical registers getLanesWithProperty() would report 0 which was not a safe default in all situations. Add a parameter to pass in a safe default. No testcase because in-tree targets do not skip computing register unit live intervals. Also cleanup the getXXX() functions to not perform the RequireLiveIntervals checks anymore so we do not even need to return safe defaults. llvm-svn: 267977
*	CodeGen: Correct specification of PHI nodes	Matthias Braun	2016-03-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	They do have a def machine operand. Fixing the definition is necessary for an upcoming patch. Differential Revision: http://reviews.llvm.org/D18384 llvm-svn: 264607
*	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions	Tom Stellard	2016-02-12	1	-6/+9
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
*	AMDGPU: Remove some old intrinsic uses from tests	Matt Arsenault	2016-02-11	1	-5/+5
\| \| \| \|	llvm-svn: 260493
*	AMDGPU: Hack for VS_32 register pressure	Matt Arsenault	2015-11-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321
*	AMDGPU: Improve accuracy of instruction rates for VOPC	Matt Arsenault	2015-09-25	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582
*	R600 -> AMDGPU rename	Tom Stellard	2015-06-13	1	-0/+188
	llvm-svn: 239657