summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/Utils
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-10-172-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696
* AMDGPU: Clear the bits before they are being set in program resource registersKonstantin Zhuravlyov2018-09-141-0/+1
| | | | | | Change by Tony Tye llvm-svn: 342270
* AMDGPU: Re-apply r341982 after fixing the layering issueKonstantin Zhuravlyov2018-09-122-217/+158
| | | | | | | | | | | | Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069
* Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into ↵Ilya Biryukov2018-09-122-158/+217
| | | | | | | | | | | TargetParser." This reverts commit r341982. The change introduced a layering violation. Reverting to unbreak our integrate. llvm-svn: 342023
* AMDGPU: Move isa version and EF_AMDGPU_MACH_* determinationKonstantin Zhuravlyov2018-09-112-217/+158
| | | | | | | | | | | | | | into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). Differential Revision: https://reviews.llvm.org/D51890 llvm-svn: 341982
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-23/+0
| | | | llvm-svn: 341165
* [AMDGPU] New buffer intrinsicsTim Renouf2018-08-212-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.struct.buffer.load llvm.amdgcn.struct.buffer.load.format llvm.amdgcn.struct.buffer.load.format.d16 llvm.amdgcn.raw.buffer.store llvm.amdgcn.raw.buffer.store.format llvm.amdgcn.raw.buffer.store.format.d16 llvm.amdgcn.struct.buffer.store llvm.amdgcn.struct.buffer.store.format llvm.amdgcn.struct.buffer.store.format.d16 llvm.amdgcn.raw.buffer.atomic.* llvm.amdgcn.struct.buffer.atomic.* with the following changes from the llvm.amdgcn.buffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::BUFFER_* SD nodes always have an index operand, all three offset operands, combined cachepolicy operand, and an extra idxen operand. The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50306 Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205 llvm-svn: 340269
* [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zeroRyan Taylor2018-08-012-0/+10
| | | | | | | | | | | | | | | Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-282-8/+8
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* [AMDGPU] Update assembler for HSA Code Object v3Scott Linder2018-06-212-1/+82
| | | | | | | | | | | | | | Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281
* AMDGPU: Select MIMG instructions manually in SITargetLoweringNicolai Haehnle2018-06-212-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228
* AMDGPU: Refactor MIMG instruction TableGen using generic tablesNicolai Haehnle2018-06-212-80/+21
| | | | | | | | | | | | | | | | | | | | Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227
* AMDGPU: Use generic tables instead of SearchableTableNicolai Haehnle2018-06-211-3/+3
| | | | | | | | | | | | | Summary: Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48014 Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226
* AMDGPU: Turn D16 for MIMG instructions into a regular operandNicolai Haehnle2018-06-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222
* AMDHSA: Code object v3 updatesKonstantin Zhuravlyov2018-06-122-4/+4
| | | | | | | | | | | | | | | - Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments llvm-svn: 334519
* AMDGPU : Recalculate SGPRs when trap handler is supportedKonstantin Zhuravlyov2018-05-162-6/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D29911 llvm-svn: 332523
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-012-16/+16
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* AMDGPU: Add Vega12 and Vega20Matt Arsenault2018-04-301-0/+4
| | | | | | | | Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331215
* [AMDGPU] Enabled v2.16 literals for VOP3PStanislav Mekhanoshin2018-04-171-8/+0
| | | | | | | | Literal encoding needs op_sel_hi to select low 16 bit in this case. Differential Revision: https://reviews.llvm.org/D45745 llvm-svn: 330230
* AMDGPU: Remove max_scratch_backing_memory_byte_size from kernel headerKonstantin Zhuravlyov2018-04-092-2/+1
| | | | | | | | | | | 1. Remove max_scratch_backing_memory_byte_size from kernel header 2. Make it a reserved field 3. Ignore it while parsing assembly for backwards compatibility 4. Bump up minor version of kernel header Differential Revision: https://reviews.llvm.org/D45452 llvm-svn: 329620
* AMDGPU: Fix copying i1 value out of loop with non-uniform exitNicolai Haehnle2018-04-043-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's last iteration. There are also useful and correct cases of an i1-value being copied between basic blocks, e.g. when a condition is computed outside of a loop and used inside it. The concept of dominators is not sufficient to capture what is going on, so I propose the notion of "lane-dominators". Fixes a bug encountered in Nier: Automata. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743 Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40547 llvm-svn: 329164
* AMDGPU: Make isIntrinsicSourceOfDivergence table-drivenNicolai Haehnle2018-04-011-47/+13
| | | | | | | | | | | | | | | | Summary: This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44938 llvm-svn: 328939
* [AMDGPU] Add default ISA version targetsStanislav Mekhanoshin2018-03-061-0/+6
| | | | | | | | | | | In case if -mattr used to modify feature set bits in llvm-mc call getIsaVersion can fail to identify specific ISA due to test mismatch. Adding default fallback tests which will always correctly report at least major version. Differential Revision: https://reviews.llvm.org/D44163 llvm-svn: 326825
* Pass Divergence Analysis data to Selection DAG to drive divergenceAlexander Timofeev2018-03-052-0/+54
| | | | | | | | dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703
* AMDGPU: Bring processors and features in sync with the specKonstantin Zhuravlyov2018-02-161-2/+0
| | | | | | | | | | - Remove gfx800 - Make iceland gfx802 - Add xnack to gfx902 Differential Revision: https://reviews.llvm.org/D43355 llvm-svn: 325393
* [AMDGPU] Change constant addr space to 4Yaxun Liu2018-02-131-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030
* Reapply "AMDGPU: Add 32-bit constant address space"Matt Arsenault2018-02-091-1/+2
| | | | | | This reverts r324494 and reapplies r324487. llvm-svn: 324747
* AMDGPU: Fix layering issueMatt Arsenault2018-02-092-19/+0
| | | | | | | Move utility function that depends on codegen. Fixes build with r324487 reapplied. llvm-svn: 324746
* Revert "AMDGPU: Add 32-bit constant address space"Rafael Espindola2018-02-071-5/+1
| | | | | | | | This reverts commit r324487. It broke clang tests. llvm-svn: 324494
* AMDGPU: Add 32-bit constant address spaceMarek Olsak2018-02-071-1/+5
| | | | | | | | | | | | | | | | | | | | | | | Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487
* [AMDGPU][MC] Corrected dst/data size for MIMG opcodes with d16 modifierDmitry Preobrazhensky2018-02-052-1/+5
| | | | | | | | | See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154 Differential Revision: https://reviews.llvm.org/D42847 Reviewers: cfang, artem.tamazov, arsenm llvm-svn: 324237
* [AMDGPU][MC] Added validation of d16 and r128 modifiers of MIMG opcodesDmitry Preobrazhensky2018-02-052-0/+6
| | | | | | | | | | | See bugs 36094, 36095: https://bugs.llvm.org/show_bug.cgi?id=36094 https://bugs.llvm.org/show_bug.cgi?id=36095 Differential Revision: https://reviews.llvm.org/D42692 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 324231
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-11/+3
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* [AMDGPU][MC] Added support of 64-bit image atomicsDmitry Preobrazhensky2018-01-262-0/+27
| | | | | | | | | See bug 35998: https://bugs.llvm.org/show_bug.cgi?id=35998 Differential Revision: https://reviews.llvm.org/D42469 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323534
* [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32Stanislav Mekhanoshin2018-01-151-1/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D41617 llvm-svn: 322500
* [AMDGPU][MC][GFX8][GFX9] Added XNACK_MASK supportDmitry Preobrazhensky2018-01-102-0/+6
| | | | | | | | | See bug 35764: https://bugs.llvm.org/show_bug.cgi?id=35764 Differential Revision: https://reviews.llvm.org/D41614 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 322189
* [AMDGPU][MC] Added support of 256- and 512-bit tuples of ttmp registersDmitry Preobrazhensky2017-12-221-0/+4
| | | | | | | | | | | | See bug 35561: https://bugs.llvm.org/show_bug.cgi?id=35561 This patch also affects implementation of SGPR and VGPR registers though changes are cosmetic. Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41437 llvm-svn: 321359
* AMDGPU: Partially fix disassembly of MIMG instructionsMatt Arsenault2017-12-132-0/+70
| | | | | | | | | | | | | | | | | | | | | Stores failed to decode at all since they didn't have a DecoderNamespace set. Loads worked, but did not change the register width displayed to match the numbmer of enabled channels. The number of printed registers for vaddr is still wrong, but I don't think that's encoded in the instruction so there's not much we can do about that. Image atomics are still broken. MIMG is the same encoding for SI/VI, but the image atomic classes are split up into encoding specific versions unlike every other MIMG instruction. They have isAsmParserOnly set on them for some reason. dmask is also special for these, so we probably should not have it as an explicit operand as it is now. llvm-svn: 320614
* [AMDGPU][MC][GFX9] Corrected encoding of ttmp registers, disabled tba/tmaDmitry Preobrazhensky2017-12-111-29/+53
| | | | | | | | | | | | See bugs 35494 and 35559: https://bugs.llvm.org/show_bug.cgi?id=35494 https://bugs.llvm.org/show_bug.cgi?id=35559 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41007 llvm-svn: 320375
* AMDGPU/GCN: Bring processors in sync with AMDGPUUsageKonstantin Zhuravlyov2017-12-081-10/+7
| | | | | | | | | | | | - Add gfx704 - Change bonaire to gfx704 - Remove gfx804 - Remove gfx901 - Remove gfx903 Differential Revision: https://reviews.llvm.org/D40046 llvm-svn: 320194
* AMDGPU: Fix set but not used warnings related to AMDGPUASKonstantin Zhuravlyov2017-11-012-9/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D39499 llvm-svn: 317114
* [AMDGPU] Clean up symbols in the global namespace.Benjamin Kramer2017-10-311-23/+0
| | | | llvm-svn: 317051
* AMDGPU: Do not emit deprecated notes for code object v3Konstantin Zhuravlyov2017-10-142-0/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D38749 llvm-svn: 315810
* AMDGPU: Add support for isa version noteKonstantin Zhuravlyov2017-10-142-0/+19
| | | | | | | | | | - Emit NT_AMD_AMDGPU_ISA - Add assembler parsing for isa version directive - If isa version directive does not match command line arguments, then return error Differential Revision: https://reviews.llvm.org/D38748 llvm-svn: 315808
* [AMDGPU] calling conventions for AMDPAL OS typeTim Renouf2017-09-291-0/+6
| | | | | | | | | | | | | | | Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 llvm-svn: 314502
* [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-102-30/+26
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 310541
* AMDGPU: Cleanup subtarget featuresMatt Arsenault2017-08-071-6/+9
| | | | | | | | | | | | Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. llvm-svn: 310258
* AMDGPU: Fix using SMRD instructions for argument loads in functionsMatt Arsenault2017-07-262-1/+31
| | | | | | These are not actually uniform values except in kernels. llvm-svn: 309172
* [AMDGPU][MC] Optimized IsRegIntersect functionDmitry Preobrazhensky2017-07-181-16/+2
| | | | | | | | | | | | Optimized IsRegIntersect by using MCRegAliasIterator See Bug 33800: https://bugs.llvm.org//show_bug.cgi?id=33800 Reviewers: arsenm, artem.tamazov Differential Revision: https://reviews.llvm.org/D35452 llvm-svn: 308294
* Revert "AMDGPU: Do not test for SI in getIsaVersion"Konstantin Zhuravlyov2017-07-111-1/+1
| | | | | | | | This reverts commit r307573. This breaks downstream test. llvm-svn: 307678
OpenPOWER on IntegriCloud