summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/local-atomics.ll
Commit message (Collapse)AuthorAgeFilesLines
* Allow FP types for atomicrmw xchgMatt Arsenault2019-01-171-0/+14
| | | | llvm-svn: 351427
* [AMDGPU] Add an AMDGPU specific atomic optimizer.Neil Henning2018-10-081-5/+5
| | | | | | | | | | | | | | | | | | | | | | | This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 llvm-svn: 343973
* AMDGPU: Select DS insts without m0 initializationMatt Arsenault2017-11-291-24/+190
| | | | | | | | | GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed. llvm-svn: 319270
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-54/+54
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* Enable FeatureFlatForGlobal on Volcanic IslandsMatt Arsenault2017-01-241-1/+1
| | | | | | | | | | | This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
* AMDGPU: Add atomic_inc + atomic_dec intrinsicsMatt Arsenault2016-04-121-1/+0
| | | | | | | These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074
* AMDGPU/SI: Enable lanemask tracking in mischedTom Stellard2016-03-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876
* AMDGPU: Remove atomic inc/dec patternsMatt Arsenault2016-03-231-41/+41
| | | | | | | There is no benefit to these since materializing the constant 1 requires the same number of instructions as materializing uint_max llvm-svn: 264215
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+551
llvm-svn: 239657
OpenPOWER on IntegriCloud