summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/array-ptr-calc-i32.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Switch backend default max workgroup size to 1024Matt Arsenault2019-11-131-1/+1
| | | | | | | | | | | | | Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-5/+5
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* AMDGPU: Don't use MUBUF vaddr if address may overflowMatt Arsenault2017-11-151-3/+5
| | | | | | | Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. llvm-svn: 318240
* AMDGPU: Cleanup subtarget featuresMatt Arsenault2017-08-071-2/+2
| | | | | | | | | | | | Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. llvm-svn: 310258
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-1/+1
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* AMDGPU: Always allocate emergency stack slot at offset 0Matt Arsenault2017-02-221-5/+1
| | | | | | | | | This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-1/+1
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU: Fix promote alloca pass creating huge arraysMatt Arsenault2016-05-161-1/+1
| | | | | | | | | | | | | | | This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708
* AMDGPU: Fix mishandling array allocations when promoting allocaMatt Arsenault2016-04-281-6/+6
| | | | | | | | The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916
* AMDGPU: Enable LocalStackSlotAllocation passMatt Arsenault2016-04-161-1/+4
| | | | | | | | | | | This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508
* AMDGPU: Remove some old intrinsic uses from testsMatt Arsenault2016-02-111-7/+12
| | | | llvm-svn: 260493
* AMDGPU: Do not promote allocas with non-inbounds GEPsMatt Arsenault2016-02-021-5/+5
| | | | | | | | If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573
* AMDGPU: Switch barrier intrinsics to using convergentMatt Arsenault2015-12-191-2/+2
| | | | | | | | noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075
* AMDGPU: Add sdst operand to VOP2b instructionsMatt Arsenault2015-08-291-2/+2
| | | | | | | | | | The VOP3 encoding of these allows any SGPR pair for the i1 output, but this was forced before to always use vcc. This doesn't yet try to use this, but does add the operand to the definitions so the main change is adding vcc to the output of the VOP2 encoding. llvm-svn: 246358
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+44
llvm-svn: 239657
OpenPOWER on IntegriCloud