summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/indirect-private-64.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elementsChangpeng Fang2018-02-161-2/+2
| | | | | | | | | | | | | | Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-16/+16
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* [AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront sizeStanislav Mekhanoshin2017-04-061-1/+1
| | | | | | | | | | If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guarantied to come to the same point at the same time. Differential Revision: https://reviews.llvm.org/D31731 llvm-svn: 299659
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-4/+4
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* [AMDGPU] Account workgroup size in LDS occupancy limitsStanislav Mekhanoshin2017-02-011-6/+6
| | | | | | | | | | | | | | | | | | Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
* Enable FeatureFlatForGlobal on Volcanic IslandsMatt Arsenault2017-01-241-2/+2
| | | | | | | | | | | This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-1/+1
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU/SI: Enable load-store-opt by default.Changpeng Fang2016-05-261-2/+10
| | | | | | | | | | Summary: Enable load-store-opt by default, and update LIT tests. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D20694 llvm-svn: 270894
* AMDGPU: Fix promote alloca pass creating huge arraysMatt Arsenault2016-05-161-3/+3
| | | | | | | | | | | | | | | This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708
* AMDGPU: Change private_element_size to 4Matt Arsenault2016-05-111-10/+41
| | | | llvm-svn: 269145
* AMDGPU: Fix mishandling array allocations when promoting allocaMatt Arsenault2016-04-281-10/+10
| | | | | | | | The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916
* AMDGPU: Remove some old intrinsic uses from testsMatt Arsenault2016-02-111-10/+12
| | | | llvm-svn: 260493
* AMDGPU: Do not promote allocas with non-inbounds GEPsMatt Arsenault2016-02-021-4/+4
| | | | | | | | If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573
* AMDGPU: Switch barrier intrinsics to using convergentMatt Arsenault2015-12-191-5/+5
| | | | | | | | noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075
* AMDGPU: Split LDS vector loadsMatt Arsenault2015-11-241-16/+8
| | | | | | If properly aligned this could allow using ds_read_b64. llvm-svn: 253975
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+91
llvm-svn: 239657
OpenPOWER on IntegriCloud