bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Support for v3i32/v3f32	Tim Renouf	2019-03-21	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
*	[FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests	Joel E. Denny	2018-07-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47171 This commit drops that patch's changes to: llvm/test/CodeGen/NVPTX/f16x2-instructions.ll llvm/test/CodeGen/NVPTX/param-load-store.ll For some reason, the dos line endings there prevent me from commiting via the monorepo. A follow-up commit (not via the monorepo) will finish the patch. llvm-svn: 336843
*	AMDGPU: Add pass to lower kernel arguments to loads	Matt Arsenault	2018-06-26	1	-72/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650
*	AMDGPU: Make v4i16/v4f16 legal	Matt Arsenault	2018-06-15	1	-25/+53
\| \| \| \| \| \| \|	Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
*	AMDGPU: Try a lot harder to emit scalar loads	Matt Arsenault	2018-06-07	1	-59/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. llvm-svn: 334180
*	AMDGPU: Switch some half using-tests to use amdhsa	Matt Arsenault	2018-06-01	1	-107/+110
\| \| \| \| \| \| \|	The default clover ABI weirdly promotes half to float, which should probably be fixed. llvm-svn: 333730
*	AMDGPU: Use better alignment for kernarg lowering	Matt Arsenault	2018-05-30	1	-23/+11
\| \| \| \| \| \| \| \| \| \|	This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. llvm-svn: 333558
*	AMDGPU: Make v2i16/v2f16 legal on VI	Matt Arsenault	2018-05-22	1	-36/+68
\| \| \| \| \| \| \| \| \| \| \| \|	This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953
*	AMDGPU: Cleanup subtarget features	Matt Arsenault	2017-08-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. llvm-svn: 310258
*	AMDGPU: Allow SIShrinkInstructions to work in non-SSA	Matt Arsenault	2017-07-10	1	-3/+3
\| \| \| \| \| \| \| \|	Immediates can be folded as long as the immediate is a vreg. Also undo commuting instructions if it didn't fold an immediate. llvm-svn: 307575
*	[AMDGPU] Switch scalarize global loads ON by default	Alexander Timofeev	2017-07-04	1	-2/+2
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097
*	Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"	NAKAMURA Takumi	2017-07-04	1	-2/+2
\| \| \| \| \| \| \| \| \|	It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054
*	[AMDGPU] Switch scalarize global loads ON by default	Alexander Timofeev	2017-07-03	1	-2/+2
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026
*	[AMDGPU] Resubmit SDWA peephole: enable by default	Sam Kolton	2017-04-06	1	-47/+72
\| \| \| \| \| \| \| \| \| \|	Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299654
*	Revert r299536. [AMDGPU] SDWA peephole: enable by default.	Ivan Krasin	2017-04-05	1	-72/+47
\| \| \| \| \| \| \| \| \| \| \|	Reason: breaks multiple bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173 Original Review URL: https://reviews.llvm.org/D31671 llvm-svn: 299583
*	[AMDGPU] SDWA peephole: enable by default	Sam Kolton	2017-04-05	1	-47/+72
\| \| \| \| \| \| \| \| \| \|	Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299536
*	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel	Matt Arsenault	2017-03-21	1	-44/+44
\| \| \| \| \| \| \| \| \| \| \| \|	Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
*	Enable FeatureFlatForGlobal on Volcanic Islands	Matt Arsenault	2017-01-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
*	[AMDGPU] Do not allow register coalescer to create big superregs	Stanislav Mekhanoshin	2017-01-18	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413
*	[AMDGPU] Add f16 support (VI+)	Konstantin Zhuravlyov	2016-11-13	1	-18/+12
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
*	AMDGPU: Add VI i16 support	Tom Stellard	2016-11-10	1	-11/+25
\| \| \| \| \| \| \| \|	Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464
*	Revert "AMDGPU: Add VI i16 support"	Tom Stellard	2016-11-04	1	-25/+11
\| \| \| \| \| \|	This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995
*	AMDGPU: Add VI i16 support	Tom Stellard	2016-11-03	1	-11/+25
\| \| \| \| \| \| \| \|	Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939
*	AMDGPU: Fix immediate folding logic when shrinking instructions	Matt Arsenault	2016-09-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	If the literal is being folded into src0, it doesn't matter if it's an SGPR because it's being replaced with the literal. Also fixes initially selecting 32-bit versions of some instructions which also confused commuting. llvm-svn: 281117
*	AMDGPU: Improve load/store of illegal types.	Matt Arsenault	2016-07-01	1	-62/+23
\| \| \| \| \| \| \| \| \| \|	There was a combine before to handle the simple copy case. Split this into handling loads and stores separately. We might want to change how this handles some of the vector extloads, since this can result in large code size increases. llvm-svn: 274394
*	AMDGPU: Define priorities for register classes	Matt Arsenault	2016-05-21	1	-3/+2
\| \| \| \| \| \| \| \| \| \|	Allocating larger register classes first should give better allocation results (and more importantly for myself, make the lit tests more stable with respect to scheduler changes). Patch by Matthias Braun llvm-svn: 270312
*	AMDGPU/SI: Enable the post-ra scheduler	Tom Stellard	2016-04-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
*	AMDGPU/SI: Assembler: Unify parsing/printing of operands.	Nikolay Haustov	2016-04-29	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015
*	DAGCombiner: Turn truncate of a bitcasted vector to an extract	Matt Arsenault	2016-03-01	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \|	On AMDGPU where operations i64 operations are often bitcasted to v2i32 and back, this pattern shows up regularly where it breaks some expected combines on i64, such as load width reducing. This fixes some test failures in a future commit when i64 loads are changed to promote. llvm-svn: 262397
*	AMDGPU: Reduce 64-bit lshr by constant to 32-bit	Matt Arsenault	2016-01-18	1	-2/+0
\| \| \| \| \| \|	64-bit shifts are very slow on some subtargets. llvm-svn: 258090
*	AMDGPU: Make v2i64/v2f64 legal types.	Matt Arsenault	2015-11-25	1	-4/+22
\| \| \| \| \| \| \|	They can be loaded and stored, so count them as legal. This is mostly to fix a number of common cases for load/store merging. llvm-svn: 254086
*	AMDGPU: Split x8 and x16 vector loads instead of scalarize	Matt Arsenault	2015-11-24	1	-24/+66
\| \| \| \| \| \| \| \|	The one regression in the builtin tests is in the read2 test which now (again) has many extra copies, but this should be solved once the pass is replaced with a DAG combine. llvm-svn: 253974
*	Introduce target hook for optimizing register copies	Matt Arsenault	2015-09-24	1	-34/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478
*	SelectionDAG: Support Expand of f16 extloads	Matt Arsenault	2015-09-09	1	-0/+81
\| \| \| \| \| \| \| \| \| \|	Currently this hits an assert that extload should always be supported, which assumes integer extloads. This moves a hack out of SI's argument lowering and is covered by existing tests. llvm-svn: 247113
*	R600 -> AMDGPU rename	Tom Stellard	2015-06-13	1	-0/+525
	llvm-svn: 239657