bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Scalarization of global uniform loads.	Alexander Timofeev	2016-12-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry. Reviewers: rampitec, tstellarAMD, arsenm Subscribers: wdng, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D26917 llvm-svn: 289076
*	[AMDGPU] Add f16 support (VI+)	Konstantin Zhuravlyov	2016-11-13	1	-0/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
*	AMDGPU: Use 1/2pi inline imm on VI	Matt Arsenault	2016-10-29	1	-0/+2
\| \| \| \| \| \|	I'm guessing at how it is supposed to be printed llvm-svn: 285490
*	AMDGPU: Diagnose using too many SGPRs	Matt Arsenault	2016-10-28	1	-0/+10
\| \| \| \| \| \|	This is possible when using inline asm. llvm-svn: 285447
*	AMDGPU/SI: Don't allow unaligned scratch access	Tom Stellard	2016-10-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257
*	AMDGPU: Add instruction definitions for VGPR indexing	Matt Arsenault	2016-10-12	1	-0/+2
\| \| \| \| \| \| \|	VI added a second method of indexing into VGPRs besides using v_movrel* llvm-svn: 284027
*	AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size	Tom Stellard	2016-09-23	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D24835 llvm-svn: 282223
*	[AMDGPU] Wave and register controls	Konstantin Zhuravlyov	2016-09-06	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
*	AMDGPU/SI: Implement a custom MachineSchedStrategy	Tom Stellard	2016-08-29	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
*	AMDGPU: Fix crashes on memory functions	Matt Arsenault	2016-08-11	1	-1/+2
\| \| \| \|	llvm-svn: 278369
*	AMDGPU: Delete dead code	Matt Arsenault	2016-07-25	1	-26/+0
\| \| \| \|	llvm-svn: 276675
*	AMDGPU: Add feature for unaligned access	Matt Arsenault	2016-07-01	1	-2/+3
\| \| \| \|	llvm-svn: 274398
*	Target: Remove unused arguments from overrideSchedPolicy, NFC	Duncan P. N. Exon Smith	2016-07-01	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr* arguments (begin and end) that invite implicit conversions from MachineInstrBundleIterator. One option would be to change their type to an iterator, but since they don't seem to have been used since the API was added in 2010, I'm deleting the dead code. llvm-svn: 274304
*	AMDGPU: Fix global isel crashes	Matt Arsenault	2016-06-28	1	-2/+2
\| \| \| \|	llvm-svn: 274039
*	AMDGPU: Fix global isel build	Matt Arsenault	2016-06-28	1	-15/+3
\| \| \| \|	llvm-svn: 273964
*	AMDGPU: Implement per-function subtargets	Matt Arsenault	2016-06-27	1	-11/+1
\| \| \| \|	llvm-svn: 273940
*	AMDGPU: Move subtarget feature checks into passes	Matt Arsenault	2016-06-27	1	-1/+0
\| \| \| \|	llvm-svn: 273937
*	[AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in ↵	Konstantin Zhuravlyov	2016-06-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the kernel code header Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue. Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format: - offset 0: work group ID x - offset 4: work group ID y - offset 8: work group ID z - offset 16: work item ID x - offset 20: work item ID y - offset 24: work item ID z Set - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled Differential Revision: http://reviews.llvm.org/D20335 llvm-svn: 273769
*	AMDGPU: Remove disable-irstructurizer subtarget feature	Matt Arsenault	2016-06-24	1	-1/+0
\| \| \| \| \| \| \| \|	The only real reason to use it is for testing, so replace it with a command line option instead of a potentially function dependent feature. llvm-svn: 273653
*	AMDGPU: Cleanup subtarget handling.	Matt Arsenault	2016-06-24	1	-100/+111
\| \| \| \| \| \| \| \| \|	Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
*	AMDGPU: Make FrameLowering stack alignment 16	Matt Arsenault	2016-06-22	1	-3/+4
\| \| \| \| \| \| \|	We don't need it to be that high. The natural alignment for a single workitem's stack is 16. llvm-svn: 273448
*	AMDGPU: Fix crashes on unknown processor name	Matt Arsenault	2016-06-02	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the processor name failed to parse for amdgcn, the resulting output would have R600 ISA in it. If the processor name was missing or invalid for R600, the wavefront size would not be set and there would be crashes from missing itinerary data. Fixes crashes in future commit caused by dividing by the unset/0 wavefront size. llvm-svn: 271561
*	AMDGPU/SI: Enable load-store-opt by default.	Changpeng Fang	2016-05-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Summary: Enable load-store-opt by default, and update LIT tests. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D20694 llvm-svn: 270894
*	[AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs	Konstantin Zhuravlyov	2016-05-24	1	-1/+1
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D20081 llvm-svn: 270594
*	AMDGPU: Fix promote alloca pass creating huge arrays	Matt Arsenault	2016-05-16	1	-0/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708
*	AMDGPU: Change private_element_size to 4	Matt Arsenault	2016-05-11	1	-1/+1
\| \| \| \|	llvm-svn: 269145
*	[AMDGPU] Reserve VGPRs for trap handler usage if instructed	Konstantin Zhuravlyov	2016-04-26	1	-1/+1
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563
*	[AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt	Konstantin Zhuravlyov	2016-04-18	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626
*	AMDGPU: Add skeleton GlobalIsel implementation	Tom Stellard	2016-04-14	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356
*	AMDGPU: Add a shader calling convention	Nicolai Haehnle	2016-04-06	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \|	This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589
*	AMDGPU/SI: Enable lanemask tracking in misched	Tom Stellard	2016-03-30	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876
*	AMDGPU: More bits of frame index are known to be zero	Matt Arsenault	2016-02-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The maximum private allocation for the whole GPU is 4G, so the maximum possible index for a single workitem is the maximum size divided by the smallest granularity for a dispatch. This increases the number of known zero high bits, which enables more offset folding. The maximum private size per workitem with this is 128M but may be smaller still. llvm-svn: 262153
*	AMDGPU: Split vi-insts subtarget feature	Matt Arsenault	2016-02-27	1	-1/+2
\| \| \| \| \| \| \|	This will be more useful for marking builtins acceptable for which subtargets. llvm-svn: 262121
*	AMDGPU: Implement readcyclecounter	Matt Arsenault	2016-02-27	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	This matches the behavior of the HSAIL clock instruction. s_realmemtime is used if the subtarget supports it, and falls back to s_memtime if not. Also introduces new intrinsics for each of s_memtime / s_memrealtime. llvm-svn: 262119
*	AMDGPU: Set element_size in private resource descriptor	Matt Arsenault	2016-02-12	1	-1/+6
\| \| \| \| \| \| \| \| \|	Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651
*	AMDGPU: Match some med3 patterns	Matt Arsenault	2016-01-28	1	-3/+3
\| \| \| \|	llvm-svn: 259089
*	AMDGPU: Fix default device handling	Matt Arsenault	2016-01-27	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When no device name is specified, default to kaveri for HSA since SI is not supported and it woud fail. Default to "tahiti" instead of "SI" since these are effectively the same, and tahiti is an actual device. Move default device handling to the TargetMachine rather than the AMDGPUSubtarget. The module ISA version is computed from the device name provided with the target machine, so the attributes printed by the AsmPrinter were inconsistent with those computed in the subtarget. Also remove DevName field from subtarget since it's redundant with getCPU() in the superclass. llvm-svn: 258901
*	AMDGPU: Remove Feature64BitPtr	Matt Arsenault	2016-01-23	1	-1/+1
\| \| \| \| \| \| \|	This is a leftover from AMDIL that doesn't do anything and doesn't belong here. llvm-svn: 258606
*	AMDGPU/SI: Pass whether to use the SI scheduler via Target Attribute	Tom Stellard	2016-01-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently the SI scheduler can be selected via command line option, but it turned out it would be better if it was selectable via a Target Attribute. This patch adds "si-scheduler" attribute to the backend. Reviewers: tstellarAMD, echristo Subscribers: echristo, arsenm Differential Revision: http://reviews.llvm.org/D16192 llvm-svn: 258386
*	AMDGPU: Add subtarget feature for instruction rates	Matt Arsenault	2016-01-18	1	-4/+7
\| \| \| \|	llvm-svn: 258085
*	AMDGPU: add +xnack feature	Nicolai Haehnle	2016-01-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically. The hardware only requires this reservation when the XNACK feature is explicitly enabled. At some point, HSA will probably want to do that, but it does increase SGPR register pressure, so leave it disabled by default for now (but do add a small test). Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15869 llvm-svn: 256794
*	AMDGPU/SI: Use flat for global load/store when targeting HSA	Changpeng Fang	2015-12-22	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256282
*	Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA"	Rafael Espindola	2015-12-22	1	-5/+3
\| \| \| \| \| \| \| \|	This reverts commit r256273. It broke CodeGen/AMDGPU/llvm.dbg.value.ll llvm-svn: 256275
*	AMDGPU/SI: Use flat for global load/store when targeting HSA	Changpeng Fang	2015-12-22	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256273
*	AMDGPU: Cleanup includes	Matt Arsenault	2015-11-06	1	-0/+1
\| \| \| \|	llvm-svn: 252328
*	AMDGPU: Create emergency stack slots during frame lowering	Matt Arsenault	2015-11-06	1	-3/+13
\| \| \| \| \| \|	Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327
*	Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and ↵	Daniel Sanders	2015-09-15	1	-7/+7
\| \| \| \| \| \| \| \|	related. NFC. Eric has replied and has demanded the patch be reverted. llvm-svn: 247702
*	Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* ↵	Daniel Sanders	2015-09-15	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and related. NFC. Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Thanks go to Pavel Labath for fixing LLDB for me. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247692
*	Revert r247684 - Replace Triple with a new TargetTuple ...	Daniel Sanders	2015-09-15	1	-7/+7
\| \| \| \| \| \|	LLDB needs to be updated in the same commit. llvm-svn: 247686
*	Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.	Daniel Sanders	2015-09-15	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247683