summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPU.td
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] w/a for gfx908 mfma SrcC literal HW bugStanislav Mekhanoshin2019-08-231-0/+7
| | | | | | | | gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369816
* AMDGPU/GlobalISel: Fix flat load/store of pointer typesMatt Arsenault2019-08-011-0/+9
| | | | llvm-svn: 367513
* [AMDGPU] gfx908 v_pk_fmac_f16 supportStanislav Mekhanoshin2019-07-091-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D64433 llvm-svn: 365573
* [AMDGPU] gfx908 targetStanislav Mekhanoshin2019-07-091-0/+64
| | | | | | Differential Revision: https://reviews.llvm.org/D64429 llvm-svn: 365525
* [AMDGPU] Fix for branch offset hardware workaroundRyan Taylor2019-06-261-0/+10
| | | | | | | | | | | | | | | | | | | Summary: This fixes a hardware bug that makes a branch offset of 0x3f unsafe. This replaces the 32 bit branch with offset 0x3f to a 64 bit instruction that includes the same 32 bit branch and the encoding for a s_nop 0 to follow. The relaxer than modifies the offsets accordingly. Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63494 llvm-svn: 364451
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-201-3/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* [AMDGPU] gfx1011/gfx1012 targetsStanislav Mekhanoshin2019-06-141-0/+52
| | | | | | Differential Revision: https://reviews.llvm.org/D63307 llvm-svn: 363344
* [AMDGPU] gfx1010 dpp16 and dpp8Stanislav Mekhanoshin2019-06-121-1/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186
* AMDGPU: Correct maximum possible private allocation sizeMatt Arsenault2019-05-231-7/+0
| | | | | | | | | | | | | | | | We were assuming a much larger possible per-wave visible stack allocation than is possible: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70 Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541
* AMDGPU: Assume xnack is enabled by defaultMatt Arsenault2019-05-161-1/+20
| | | | | | | | | | | This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903
* [AMDGPU] Add gfx1010 target definitionsStanislav Mekhanoshin2019-04-241-1/+191
| | | | | | Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113
* [AMDGPU] Sort out and rename multiple CI/VI predicatesStanislav Mekhanoshin2019-04-061-6/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D60346 llvm-svn: 357835
* [AMDGPU] predicate and feature refactoringStanislav Mekhanoshin2019-04-051-22/+52
| | | | | | | | | We have done some predicate and feature refactoring lately but did not upstream it. This is to sync. Differential revision: https://reviews.llvm.org/D60292 llvm-svn: 357791
* AMDGPU: Assume ECC is enabled by default if supportedMatt Arsenault2019-04-031-4/+15
| | | | | | | | | | The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558
* AMDGPU: Fix names for generation featuresMatt Arsenault2019-04-031-2/+7
| | | | | | | | We should overall stop using these, but the uppercase name didn't work. Any feature string is converted to lowercase, so these could never be found in the table. llvm-svn: 357541
* AMDGPU: Make sram-ecc off by default for Vega20Konstantin Zhuravlyov2019-03-291-1/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D59718 llvm-svn: 357247
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-081-1/+1
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* Revert "AMDGPU/NFC: Cleanup subtarget predicates"Konstantin Zhuravlyov2019-02-221-20/+13
| | | | | | | It breaks one of our downstream merges, so revert it temporarily while investigating failures downstream llvm-svn: 354700
* AMDGPU: Remove debugger related subtarget featuresMatt Arsenault2019-02-211-18/+0
| | | | | | As far as I know these aren't needed anymore. llvm-svn: 354634
* AMDGPU/NFC: Cleanup subtarget predicatesKonstantin Zhuravlyov2019-02-211-13/+20
| | | | | | Differential Revision: https://reviews.llvm.org/D58522 llvm-svn: 354620
* [AMDGPU] remove unused AssemblerPredicatesMark Searles2019-02-211-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | An internal build is hitting asserts complaining about too many subtarget features: llvm/utils/TableGen/Types.cpp:42: const char* llvm::getMinimalTypeForEnumBitfield(uint64_t): Assertion `MaxIndex <= 64 && "Too many bits"' failed. llvm/utils/TableGen/AsmMatcherEmitter.cpp:1476: void {anonymous}::AsmMatcherInfo::buildInfo(): Assertion `SubtargetFeatures.size() <= 64 && "Too many subtarget features!"' failed. The short-term solution is to remove a few unused AssemblerPredicates to get under the limit. The long-term solution seems to be to revisit these asserts. E.g., rather than hardcoded '64', use the standard sized std::bitset like the other places that track subtarget features. Differential Revision: https://reviews.llvm.org/D58516 llvm-svn: 354604
* AMDGPU/NFC: Remove SubtargetFeatureISAVersion since it is not used anywhereKonstantin Zhuravlyov2019-02-121-10/+0
| | | | llvm-svn: 353892
* [AMDGPU] Split dot-insts featureStanislav Mekhanoshin2019-02-091-6/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D57971 llvm-svn: 353587
* AMDGPU: Eliminate GPU specific SubtargetFeaturesMatt Arsenault2019-02-081-17/+20
| | | | | | | | | | | Inline compatability is determined from the individual feature bits. These are just sets of the separate features, but will always be treated as incompatible unless they are specifically ignored. Defining the ISA version number here in tablegen would be nice, but it turns out this wasn't actually used. llvm-svn: 353558
* AMDGPU: Remove GCN features and predicatesMatt Arsenault2019-02-081-10/+4
| | | | | | | These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Remove llvm.SI.load.constMatt Arsenault2019-01-181-1/+0
| | | | | | | It's taken 3 years, but now all of the old AMDGPU and SI intrinsics are finally gone llvm-svn: 351586
* [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd tryDavid Stuttard2019-01-141-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
* [AMDGPU] Separate feature dot-instsStanislav Mekhanoshin2019-01-101-1/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D56524 llvm-svn: 350793
* AMDGPU: Remove llvm.AMDGPU.killMatt Arsenault2018-12-071-1/+0
| | | | | | This is the last of the old AMDGPU intrinsics. llvm-svn: 348615
* [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)Valery Pykhtin2018-11-301-0/+4
| | | | | | | | Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 llvm-svn: 347993
* Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"David Stuttard2018-11-291-10/+0
| | | | | | | | | Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911
* Add support for TFE/LWE in image intrinsicsDavid Stuttard2018-11-291-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
* AMDGPU: Enable code object v3 for AMDHSA onlyKonstantin Zhuravlyov2018-11-151-16/+32
| | | | | | Differential Revision: https://reviews.llvm.org/D54186 llvm-svn: 346923
* AMDGPU: Add sram-ecc featureKonstantin Zhuravlyov2018-11-051-17/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177
* [AMDGPU] Remove FeatureVGPRSpillingScott Linder2018-10-311-6/+0
| | | | | | | | | | | This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763
* Revert r345542: AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-301-30/+15
| | | | | | It breaks mesa. llvm-svn: 345662
* AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-291-15/+30
| | | | | | Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542
* [AMDGPU] Defined gfx909 Raven Ridge 2Tim Renouf2018-10-241-0/+7
| | | | | | | Differential Revision: https://reviews.llvm.org/D53418 Change-Id: Ie3d054f2e956c2768988c0f4c0ffd29a47294eef llvm-svn: 345120
* [AMDGPU] Divergence driven instruction selection. Part 1.Alexander Timofeev2018-09-211-2/+2
| | | | | | | | | | | | | Summary: This change is the first part of the AMDGPU target description change. The aim of it is the effective splitting the vector and scalar flows at the selection stage. Selection uses predicate functions based on the framework implemented earlier - https://reviews.llvm.org/D35267 Differential revision: https://reviews.llvm.org/D52019 Reviewers: rampitec llvm-svn: 342719
* [AMDGPU] Ensure trig range reduction only used for subtargets that require itDavid Stuttard2018-09-141-3/+9
| | | | | | | | | | | | | | | | | | Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222
* [AMDGPU] Add support for a16 modifiear for gfx9Ryan Taylor2018-08-281-1/+10
| | | | | | | | | | | | | Summary: Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9. Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841 Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50575 llvm-svn: 340831
* AMDGPU: Add feature vi-instsMatt Arsenault2018-08-071-2/+8
| | | | | | | | | | | | | This is necessary to add a VI specific builtin, __builtin_amdgcn_s_dcache_wb. We already have an overly specific feature for one of these builtins, for s_memrealtime. I'm not sure whether it's better to add more of those, or to get rid of that and merge it with vi-insts. Alternatively, maybe this logically goes with scalar-stores? llvm-svn: 339104
* AMDGPU: Allow fp32-denormals feature for r600 targetsJan Vesely2018-08-011-9/+0
| | | | | | | | This was accidentally removed in r335942. Differential Revision: https://reviews.llvm.org/D49934 llvm-svn: 338569
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-6/+6
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-134/+14
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* AMDGPU: Remove ability to reserve VGPRs for debuggerKonstantin Zhuravlyov2018-06-211-7/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288
* AMDGPU: Refactor MIMG instruction TableGen using generic tablesNicolai Haehnle2018-06-211-0/+1
| | | | | | | | | | | | | | | | | | | | Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227
* AMDGPU: Make v2i16/v2f16 legal on VIMatt Arsenault2018-05-221-0/+3
| | | | | | | | | | | | This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953
* AMDGPU/NFC: Update D16PreservesUnusedBits description based Tony Tye's commentsKonstantin Zhuravlyov2018-05-041-1/+3
| | | | llvm-svn: 331564
OpenPOWER on IntegriCloud