summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.atomic.ll
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Add support for 64 bit buffer atomic artihmetic instructionsRyan Taylor2019-03-061-31/+110
| | | | | | | | | | | | | | | | Summary: This adds support for 64 bit buffer atomic arithmetic instructions but does not include cmpswap as that depends on a fix to the way the register pairs are handled Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58918 llvm-svn: 355520
* [AMDGPU] Add an AMDGPU specific atomic optimizer.Neil Henning2018-10-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 llvm-svn: 343973
* AMDGPU: Lower buffer store and atomic intrinsics manuallyMarek Olsak2017-11-091-0/+3
| | | | | | | | | | | | | | Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 llvm-svn: 317754
* AMDGPU: Split MUBUF offset into aligned componentsNicolai Haehnle2017-10-101-6/+6
| | | | | | | | | | | | | | | | | | | | Summary: Atomic buffer operations do not work (and trap on gfx9) when the components are unaligned, even if their sum is aligned. Previously, we generated an offset of 4156 without an SGPR by splitting it as 4095 + 61 (immediate + inline constant). The highest offset for which we can do this correctly is 4156 = 4092 + 64. Fixes dEQP-GLES31.functional.ssbo.atomic.* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37850 llvm-svn: 315302
* AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsicsNicolai Haehnle2016-06-151-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761
* AMDGPU/SI: Enable the post-ra schedulerTom Stellard2016-04-301-2/+2
| | | | | | | | | | | | | | Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
* AMDGPU/SI: Assembler: Unify parsing/printing of operands.Nikolay Haustov2016-04-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015
* AMDGPU/SI: Fix regression with no-return atomicsNicolai Haehnle2016-04-151-0/+9
| | | | | | | | | | | | | | | Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433
* AMDGPU: Add a shader calling conventionNicolai Haehnle2016-04-061-16/+15
| | | | | | | | | | | This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589
* AMDGPU/SI: Enable lanemask tracking in mischedTom Stellard2016-03-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876
* AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsicsNicolai Haehnle2016-03-181-0/+116
Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791
OpenPOWER on IntegriCloud