summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/VOP1Instructions.td
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Fix readfirstlane pattern importMatt Arsenault2020-01-071-1/+1
| | | | | The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.
* [AMDGPU][MC][GFX10] Enabled v_movrel*[sdwa|dpp|dpp8] opcodesDmitry Preobrazhensky2019-11-181-41/+22
| | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=43712 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D70170
* [AMDGPU][MC] Corrected src0 for v_movrelsd_b32 and v_movrelsd_2_b32Dmitry Preobrazhensky2019-11-081-6/+8
| | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=40903 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69888
* [AMDGPU] deduplicate tablegen predicatesStanislav Mekhanoshin2019-11-041-3/+3
| | | | | | | | | | | | | | | We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815
* [AMDGPU] Supress unused sdwa insts generationStanislav Mekhanoshin2019-10-161-2/+11
| | | | | | | | | Do not generate non-existing sdwa instructions. It reduces the number of generated instructions by 185. Differential Revision: https://reviews.llvm.org/D69010 llvm-svn: 375016
* [AMDGPU] link dpp pseudos and real instructions on gfx10Stanislav Mekhanoshin2019-10-111-22/+7
| | | | | | | | | | | | This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 llvm-svn: 374604
* [AMDGPU] Disable unused gfx10 dpp instructionsStanislav Mekhanoshin2019-10-081-0/+2
| | | | | | | | | | | Inhibit generation of unused real dpp instructions on gfx10 just like it is done on other subtargets. This does not change anything because these are illegal anyway and not accepted, but it does reduce the number of instruction definitions generated. Differential Revision: https://reviews.llvm.org/D68607 llvm-svn: 374083
* AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32Matt Arsenault2019-10-071-1/+1
| | | | llvm-svn: 373944
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-9/+9
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* GlobalISel: Don't materialize immarg arguments to intrinsicsMatt Arsenault2019-09-191-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encode them directly as an imm argument to G_INTRINSIC*. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_* instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
* AMDGPU/GlobalISel: Select llvm.amdgcn.sffbhMatt Arsenault2019-09-101-1/+1
| | | | llvm-svn: 371538
* AMDGPU: Move MnemonicAlias out of instruction def hierarchyMatt Arsenault2019-09-091-0/+7
| | | | | | | | | | | | | | | | | | Unfortunately MnemonicAlias defines a "Predicates" field just like an instruction or pattern, with a somewhat different interpretation. This ends up overriding the intended Predicates set by PredicateControl on the pseudoinstruction defintions with an empty list. This allowed incorrectly selecting instructions that should have been rejected due to the SubtargetPredicate from patterns on the instruction definition. This does remove the divergent predicate from the 64-bit shift patterns, which were already not used for the 32-bit shift, so I'm not sure what the point was. This also removes a second, redundant copy of the 64-bit divergent patterns. llvm-svn: 371427
* AMDGPU/GlobalISel: Select G_BITREVERSEMatt Arsenault2019-09-041-1/+1
| | | | llvm-svn: 370980
* [AMDGPU] Allow any value in unused src0 field in v_nopTim Renouf2019-06-241-1/+1
| | | | | | | | | | | | | Summary: The LLVM disassembler assumes that the unused src0 operand of v_nop is zero. Other tools can put another value in that field, which is still valid. This commit fixes the LLVM disassembler to recognize such an encoding as v_nop, in the same way as we already do for s_getpc. Differential Revision: https://reviews.llvm.org/D63724 Change-Id: Iaf0363eae26ff92fc4ebc716216476adbff37a6f llvm-svn: 364208
* [AMDGPU] gfx1010 dpp16 and dpp8Stanislav Mekhanoshin2019-06-121-6/+74
| | | | | | Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186
* [AMDGPU] gfx1010 VOP2 changesStanislav Mekhanoshin2019-04-261-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316
* [AMDGPU] gfx1010 - fix ubsan failureStanislav Mekhanoshin2019-04-251-1/+0
| | | | | | | Revert DecoderNamespace in one place for now. It will need more changes to properly work. llvm-svn: 359239
* [AMDGPU] gfx1010 VOP1 instructionsStanislav Mekhanoshin2019-04-251-82/+200
| | | | | | Differential Revision: https://reviews.llvm.org/D61099 llvm-svn: 359225
* [AMDGPU] Sort out and rename multiple CI/VI predicatesStanislav Mekhanoshin2019-04-061-9/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D60346 llvm-svn: 357835
* [AMDGPU] predicate and feature refactoringStanislav Mekhanoshin2019-04-051-56/+61
| | | | | | | | | We have done some predicate and feature refactoring lately but did not upstream it. This is to sync. Differential revision: https://reviews.llvm.org/D60292 llvm-svn: 357791
* [AMDGPU] V_CVT_F32_UBYTE{0,1,2,3} are full rate instructionsCarl Ritson2019-03-081-3/+4
| | | | | | | | | | | | | | | | Summary: Fix a bug in the scheduling model where V_CVT_F32_UBYTE{0,1,2,3} are incorrectly marked as quarter rate instructions. Reviewers: arsenm, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59091 llvm-svn: 355671
* [AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, ↵Dmitry Preobrazhensky2019-03-041-1/+1
| | | | | | | | | | | | v_readlane_b32 and v_writelane_b32 See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58713 llvm-svn: 355312
* Revert "AMDGPU/NFC: Cleanup subtarget predicates"Konstantin Zhuravlyov2019-02-221-14/+14
| | | | | | | It breaks one of our downstream merges, so revert it temporarily while investigating failures downstream llvm-svn: 354700
* AMDGPU/NFC: Cleanup subtarget predicatesKonstantin Zhuravlyov2019-02-211-14/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D58522 llvm-svn: 354620
* AMDGPU: Remove GCN features and predicatesMatt Arsenault2019-02-081-2/+0
| | | | | | | These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [AMDGPU] Add new Mode Register passTim Corringham2018-12-101-0/+8
| | | | | | | | | | | | | | | A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. llvm-svn: 348754
* [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)Valery Pykhtin2018-11-301-13/+17
| | | | | | | | Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 llvm-svn: 347993
* AMDGPU: Split HasExt into HasExtDPP/SDWA/SDWA9Konstantin Zhuravlyov2018-09-271-2/+7
| | | | llvm-svn: 343264
* [AMDGPU] Convert rcp to rcp_iflagStanislav Mekhanoshin2018-06-271-1/+1
| | | | | | | | | | | If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
* [AMDGPU][MC][GFX9] Added v_screen_partition_4se_b32Dmitry Preobrazhensky2018-04-111-4/+30
| | | | | | | | | See bug 36845: https://bugs.llvm.org/show_bug.cgi?id=36845 Differential Revision: https://reviews.llvm.org/D45443 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329801
* [AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16Dmitry Preobrazhensky2018-04-021-0/+8
| | | | | | | | | See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847 Differential Revision: https://reviews.llvm.org/D45097 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328988
* [AMDGPU] Fixed some instructions latenciesStanislav Mekhanoshin2018-03-301-8/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D45073 llvm-svn: 328874
* AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classesNicolai Haehnle2018-03-261-12/+2
| | | | | | | Differential revision: https://reviews.llvm.org/D44820 Change-Id: I732979e2964006aa15d78a333d8886e6855f319a llvm-svn: 328496
* [AMDGPU] Copy impdefs from pseudo to real instructionsStanislav Mekhanoshin2018-01-151-0/+1
| | | | | | | | | | In some cases we do not copy implicit defs from pseudo to real VOP instructions. It has no visible impact at the moment thus no tests are affected or added. Differential Revision: https://reviews.llvm.org/D41783 llvm-svn: 322496
* AMDGPU: Remove global isGCN predicatesMatt Arsenault2017-10-031-11/+11
| | | | | | | | | | | | | | These are problematic because they apply to everything, and can easily clobber whatever more specific predicate you are trying to add to a function. Currently instructions use SubtargetPredicate/PredicateControl to apply this to patterns applied to an instruction definition, but not to free standing Pats. Add a wrapper around Pat so the special PredicateControls requirements can be appended to the final predicate list like how Mips does it. llvm-svn: 314742
* [AMDGPU][MC][GFX9] Added integer clamping support for VOP3 opcodesDmitry Preobrazhensky2017-08-161-1/+1
| | | | | | | | | | See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152 Reviewers: SamWot, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D36674 llvm-svn: 311006
* [AMDGPU] Add llvm.amdgpu.update.dpp intrinsicConnor Abbott2017-08-081-0/+8
| | | | | | | | | | | | | | | | | | Summary: Now that we've made all the necessary backend changes, we can add a new intrinsic which exposes the new capabilities to IR producers. Since llvm.amdgpu.update.dpp is a strict superset of llvm.amdgpu.mov.dpp, we should deprecate the former. We also add tests for all the functionality that was added in previous changes, now that we can access it via an IR construct. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34718 llvm-svn: 310399
* [AMDGPU] Add pseudo "old" source to all DPP instructionsConnor Abbott2017-08-071-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: All instructions with the DPP modifier may not write to certain lanes of the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask aren't set, so the destination register may be both defined and modified. The right way to handle this is to add a constraint that the destination register is the same as one of the inputs. We could tie the destination to the first source, but that would be too restrictive for some use-cases where we want the destination to be some other value before the instruction executes. Instead, add a fake "old" source and tie it to the destination. Effectively, the "old" source defines what value unwritten lanes will get. We'll expose this functionality to users with a new intrinsic later. Also, we want to use DPP instructions for computing derivatives, which means we need to set WQM for them. We also need to enable the entire wavefront when using DPP intrinsics to implement nonuniform subgroup reductions, since otherwise we'll get incorrect results in some cases. To accomodate this, add a new operand to all DPP instructions which will be interpreted by the SI WQM pass. This will be exposed with a new intrinsic later. We'll also add support for Whole Wavefront Mode later. I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up the test. However, I could also keep the old behavior (where lanes that aren't written are undefined) if people want it. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34716 llvm-svn: 310283
* [AMDGPU] SDWA: merge VI and GFX9 pseudo instructionsSam Kolton2017-06-211-12/+4
| | | | | | | | | | | | Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9. Reviewers: dp, arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov Differential Revision: https://reviews.llvm.org/D34026 llvm-svn: 305886
* [AMDGPU] SDWA: Add assembler support for GFX9Sam Kolton2017-05-231-4/+29
| | | | | | | | | | | | | | | Summary: Added separate pseudo and real instruction for GFX9 SDWA instructions. Currently supports only in assembler. Depends D32493 Reviewers: vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33132 llvm-svn: 303620
* [AMDGPU][MC] Added support for several VI-specific opcodes (s_wakeup, etc)Dmitry Preobrazhensky2017-04-121-1/+5
| | | | | | | | | | | | | | | | | | | | | | Added support for VI: - s_endpgm_saved - s_wakeup - s_rfe_restore_b64 - v_perm_b32 Enabled for VI: - v_mov_fed_b32 - v_mov_fed_b32_e64 See bug 32593: https://bugs.llvm.org//show_bug.cgi?id=32593 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D31931 llvm-svn: 300076
* [AMDGPU][MC] Fix for Bug 28207 + LIT testsDmitry Preobrazhensky2017-03-271-15/+39
| | | | | | | | | | Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type Reviewers: vpykhtin, arsenm Differential Revision: https://reviews.llvm.org/D31327 llvm-svn: 298852
* AMDGPU: Fix unnecessary ands when packing f16 vectorsMatt Arsenault2017-03-151-1/+1
| | | | | | | | | computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
* [AMDGPU][MC] Fix for Bug 30829 + LIT testsDmitry Preobrazhensky2017-03-031-0/+2
| | | | | | | | Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction). Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code). Added LIT tests. llvm-svn: 296873
* AMDGPU: Add definition for v_swap_b32Matt Arsenault2017-02-281-4/+31
| | | | | | | | This is somewhat tricky because there are two pairs of tied operands, and it isn't allowed to be VOP3 encoded. llvm-svn: 296519
* AMDGPU: Add VOP3P instruction formatMatt Arsenault2017-02-271-1/+1
| | | | | | | | Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368
* AMDGPU/SI: Fix trunc i16 patternJan Vesely2017-02-231-6/+0
| | | | | | | | Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990
* AMDGPU: Fix trailing whitespaceMatt Arsenault2017-02-101-1/+1
| | | | llvm-svn: 294694
OpenPOWER on IntegriCloud