summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/large-work-group-promote-alloca.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU Reduce reported maximum group size to 1024Matt Arsenault2019-11-131-3/+4
| | | | | While some targets allow encoding 2048, this was never tested or supported.
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-201-6/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* AMDGPU: Remove amdgpu-max-work-group-size attributeMatt Arsenault2019-06-051-1/+1
| | | | | | | This has been deprecated for a long time, and mesa recently switched to amdgpu-flat-work-group-size. llvm-svn: 362641
* AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elementsChangpeng Fang2018-02-161-2/+2
| | | | | | | | | | | | | | Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-11/+11
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* [AMDGPU] Fix MaxWorkGroupsPerCU for large workgroupsStanislav Mekhanoshin2017-02-151-2/+4
| | | | | | | | | | This patch corrects the maximum workgroups per CU if we have big workgroups (more than 128). This calculation contributes to the occupancy calculation in respect to LDS size. Differential Revision: https://reviews.llvm.org/D29974 llvm-svn: 295134
* [AMDGPU] Account workgroup size in LDS occupancy limitsStanislav Mekhanoshin2017-02-011-20/+28
| | | | | | | | | | | | | | | | | | Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-7/+7
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU: Add more tests for LDS size with occupancyMatt Arsenault2016-07-261-2/+149
| | | | llvm-svn: 276821
* AMDGPU: Fix promote alloca pass creating huge arraysMatt Arsenault2016-05-161-3/+48
| | | | | | | | | | | | | | | This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708
* AMDGPU: allow specifying a workgroup size that needs to fit in a compute unitTom Stellard2016-04-141-0/+72
Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337
OpenPOWER on IntegriCloud