summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* Add IntrWrite[Arg]Mem intrinsic propertyNicolai Haehnle2016-04-191-23/+12
| | | | | | | | | | | | | | | | | | | | | | Summary: This property is used to mark an intrinsic that only writes to memory, but neither reads from memory nor has other side effects. An example where this is useful is the llvm.amdgcn.buffer.store.format.* intrinsic, which corresponds to a store instruction that goes through a special buffer descriptor rather than through a plain pointer. With this property, the intrinsic should still be handled as having side effects at the LLVM IR level, but machine scheduling can make smarter decisions. Reviewers: tstellarAMD, arsenm, joker.eph, reames Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18291 llvm-svn: 266826
* AMDGPU: Guard VOPC instructions against incorrect commuteNicolai Haehnle2016-04-191-3/+3
| | | | | | | | | | | | | | Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825
* AMDGPU/SI: SGPR accounting in getSIProgramInfo must ignore exec_lo/hiNicolai Haehnle2016-04-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Summary: A shader stored the live mask (initial exec mask) in an SGPR which was then spilled during register allocation. The allocator quite reasonably optimized turned the spill into v_writelane_b32 %vgpr, exec_lo, N v_writelane_b32 %vgpr, exec_hi, N+1 at the beginning of the shader, confusing the SGPR accounting. No test case, because si-sgpr-spill.ll together with an upcoming patch for WQM handling exhibits the problem. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19199 llvm-svn: 266824
* [AMDGPU] Add insert nops pass based on subtarget features instead of cl::optKonstantin Zhuravlyov2016-04-185-14/+43
| | | | | | | | | | | Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626
* [AMDGPU][llvm-mc] s_setreg* - Fix order of operandsArtem Tamazov2016-04-181-2/+2
| | | | | | | | Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first. Differential Revision: http://reviews.llvm.org/D19161 llvm-svn: 266617
* Silence some "initialized but unused" warnings from MSVC -- the function ↵Aaron Ballman2016-04-181-13/+2
| | | | | | being called is a static function, so there's no need for an instance variable. NFC. llvm-svn: 266616
* [NFC] Header cleanupMehdi Amini2016-04-188-9/+5
| | | | | | | | | | | | | | Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595
* AMDGPU: Enable LocalStackSlotAllocation passMatt Arsenault2016-04-162-0/+159
| | | | | | | | | | | This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508
* AMDGPU: Use s_addk_i32 / s_mulk_i32Matt Arsenault2016-04-161-12/+45
| | | | llvm-svn: 266506
* [MachineScheduler]Add support for store clusteringJun Bum Lim2016-04-152-6/+6
| | | | | | | | | | | | Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437
* AMDGPU/SI: Fix regression with no-return atomicsNicolai Haehnle2016-04-151-0/+1
| | | | | | | | | | | | | | | Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433
* AMDGPU: Remove custom load/store scalarizationMatt Arsenault2016-04-144-87/+7
| | | | llvm-svn: 266385
* AMDGPU: Include LDS size in printed commentMatt Arsenault2016-04-141-0/+2
| | | | llvm-svn: 266382
* AMDGPU: Run SIFoldOperands after PeepholeOptimizerMatt Arsenault2016-04-142-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378
* AMDGPU: Directly emit m0 initialization with s_mov_b32Matt Arsenault2016-04-142-14/+37
| | | | | | | | | | | | | Currently what comes out of instruction selection is a register initialized to -1, and then copied to m0. MachineCSE doesn't consider copies, but we want these to be CSEed. This isn't much of a problem currently, because SIFoldOperands is run immediately after. This avoids regressions when SIFoldOperands is run later from leaving all copies to m0. llvm-svn: 266377
* AMDGPU: Fold bitcasts of scalar constants to vectorsMatt Arsenault2016-04-141-0/+34
| | | | | | | This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376
* AMDGPU: Add skeleton GlobalIsel implementationTom Stellard2016-04-146-0/+144
| | | | | | | | | | | | | | | Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356
* [StructurizeCFG] Annotate branches that were treated as uniformNicolai Haehnle2016-04-142-4/+15
| | | | | | | | | | | | | | | | | | | Summary: This fully solves the problem where the StructurizeCFG pass does not consider the same branches as uniform as the SIAnnotateControlFlow pass. The patch in D19013 helps with this problem, but is not sufficient (and, interestingly, causes a "regression" with one of the existing test cases). No tests included here, because tests in D19013 already cover this. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19018 llvm-svn: 266346
* AMDGPU: Remove SIFixSGPRLiveRanges passNicolai Haehnle2016-04-144-242/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345
* AMDGPU: change a redundant if () to an assert(). NFCNicolai Haehnle2016-04-141-2/+1
| | | | | | | | | | | | | | | Summary: I've been carrying this change around with me for a while, because the if () managed to confuse me while following the code. All callers ensure that the assertion holds. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19042 llvm-svn: 266344
* AMDGPU: allow specifying a workgroup size that needs to fit in a compute unitTom Stellard2016-04-146-63/+94
| | | | | | | | | | | | | | | | | | | Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337
* AMDGPU/SI: Use the correct scratch wave offset register for shaders.Tom Stellard2016-04-143-9/+38
| | | | | | | | | | | | | | | | | | | | | | Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336
* AMDGPU: Implement canonicalizeMatt Arsenault2016-04-144-0/+55
| | | | | | Also add generic DAG node for it. llvm-svn: 266272
* AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registersTom Stellard2016-04-132-11/+29
| | | | | | | | | | | | | | | Summary: When we are spilling SGPRs to scratch memory, we usually don't have free SGPRs to do the address calculation, so we need to re-use the ScratchOffset register for the calculation. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18917 llvm-svn: 266244
* [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and ↵Artem Tamazov2016-04-135-37/+165
| | | | | | | | | | | | | | | TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205
* AMDGPU/SI: Fix spilling of 96-bit registersTom Stellard2016-04-121-0/+4
| | | | | | | | | | | | | | | Summary: It seems like this was broken in r252327. I thought we had test cases for this, but it's really hard to tirgger spills of this exact register size since they aren't used very much. Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19021 llvm-svn: 266152
* AMDGPU: add llvm.amdgcn.buffer.load/store intrinsicsNicolai Haehnle2016-04-121-59/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and atomic counters at least when robust buffer access behavior is desired. (These instructions perform no format conversion and do buffer range checking per component.) As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format, it has become trivial to add support for the f32 and v2f32 variants of that intrinsic, so the patch does so. Also DAG-ify (and fix) some tests that I noticed intermittent failures in while developing this patch. Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects changes to the BUFFER_STORE_DWORD* instructions. See also http://reviews.llvm.org/D18291. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18292 llvm-svn: 266126
* AMDGPU/SI: Insert wait states required after v_readfirstlane on SITom Stellard2016-04-121-0/+6
| | | | | | | | | | | | | | | | | Summary: We will be able to handle this case much better once the hazard recognizer is finished, but this conservative implementation fixes a hang with the piglit test: spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra Reviewers: arsenm, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18988 llvm-svn: 266105
* AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32Matt Arsenault2016-04-121-0/+30
| | | | | | | | | | This helps clean up some of the mess when expanding unaligned 64-bit loads when changed to be promote to v2i32, and fixes situations where or x, 0 was emitted after splitting 64-bit ors during moveToVALU. I think this could be a generic combine but I'm not sure. llvm-svn: 266104
* AMDGPU/SI: Fix a mis-compilation of multi-level breaksNicolai Haehnle2016-04-121-0/+16
| | | | | | | | | | | | | | | | | | Summary: Under certain circumstances, multi-level breaks (or what is understood by the control flow passes as such) could be miscompiled in a way that causes infinite loops, by emitting incorrect control flow intrinsics. This fixes a hang in dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18967 llvm-svn: 266088
* AMDGPU: Implement i64 global atomicsMatt Arsenault2016-04-122-14/+44
| | | | llvm-svn: 266075
* AMDGPU: Add atomic_inc + atomic_dec intrinsicsMatt Arsenault2016-04-128-11/+101
| | | | | | | These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074
* AMDGPU: Remove trailing whitespaceMatt Arsenault2016-04-121-7/+7
| | | | llvm-svn: 266073
* Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute"Tom Stellard2016-04-111-1/+1
| | | | | | | | This reverts commit r263720. Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute. llvm-svn: 265992
* AMDGPU/SI: Implement atomic load/store for i32 and i64Jan Vesely2016-04-075-24/+111
| | | | | | | | | | Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709
* AMDGPU/SI: Add latency for export instructionsTom Stellard2016-04-071-0/+1
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18599 llvm-svn: 265708
* AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStatesTom Stellard2016-04-074-6/+8
| | | | | | | | | | | | Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678
* [AMDGPU] fix readlane/readfirstlane src vgpr operand type.Valery Pykhtin2016-04-071-2/+2
| | | | | | | | | For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand). readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding). Differential Revision: http://reviews.llvm.org/D18696 llvm-svn: 265670
* Make helper functions static. NFC.Benjamin Kramer2016-04-071-14/+10
| | | | llvm-svn: 265653
* AMDGPU: Add a shader calling conventionNicolai Haehnle2016-04-0619-87/+75
| | | | | | | | | | | This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589
* [AMDGPU] AsmParser: disable DPP for unsupported instructions. New dpp tests. ↵Sam Kolton2016-04-062-11/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix v_nop_dpp. Summary: 1. Disable DPP encoding for instructions that do not support it: - VOP1: - v_readfirstlane_b32 - v_clrexcp - v_movreld_b32 - v_movrels_b32 - v_movrelsd_b32 - VOP2: - v_madmk_f16/32 - v_madak_f16/32 - VOPC, VINTRP, VOP3 2. Fix DPP for v_nop 3. New DPP tests for VOP1 and VOP2 instructions Reviewers: nhaustov, tstellarAMD, vpykhtin Subscribers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D18552 llvm-svn: 265538
* RegisterScavenger: Take a reference as enterBasicBlock() argument.Matthias Braun2016-04-061-1/+1
| | | | | | | Make it obvious that the argument cannot be nullptr. Remove an unnecessary nullptr check in initRegState. llvm-svn: 265511
* [AMDGPU] Emit linkonce and linkonce_odr symbolsKonstantin Zhuravlyov2016-04-051-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D18726 llvm-svn: 265408
* AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}Tom Stellard2016-04-018-3/+117
| | | | | | | | | | | | | | | | | Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170
* [AMDGPU] fix MADAK/MADMK instructions operand namings to match encoding fields.Valery Pykhtin2016-04-012-8/+8
| | | | | | | | $vsrc1 -> $src1, $k -> $imm Differential Revision: http://reviews.llvm.org/D18659 llvm-svn: 265141
* [AMDGPU] Disassembler: support for DPPSam Kolton2016-03-312-7/+23
| | | | | Review: http://reviews.llvm.org/D18642 llvm-svn: 265015
* AMDGPU: Add frexp_exp intrinsicMatt Arsenault2016-03-301-2/+2
| | | | llvm-svn: 264944
* Silencing warnings from MSVC 2015 Update 2. All of these changes silence ↵Aaron Ballman2016-03-302-3/+3
| | | | | | "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929
* AMDGPU/SI: Improve MachineSchedModel definitionTom Stellard2016-03-301-19/+27
| | | | | | | | | | | | | | | | | | | | | | | | This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 llvm-svn: 264877
* AMDGPU/SI: Enable lanemask tracking in mischedTom Stellard2016-03-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876
OpenPOWER on IntegriCloud