summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/VOP3Instructions.td
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Select llvm.amdgcn.interp.p2.f16 directlyMatt Arsenault2020-01-061-10/+12
| | | | This will enable automatic GlobalISel support in a future commit.
* [AMDGPU] deduplicate tablegen predicatesStanislav Mekhanoshin2019-11-041-6/+6
| | | | | | | | | | | | | | | We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815
* [AMDGPU][MC][GFX10] Added v_interp_[p1/p2/mov]_f32_e64Dmitry Preobrazhensky2019-10-281-2/+6
| | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=43747 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69348
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-22/+22
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* GlobalISel: Don't materialize immarg arguments to intrinsicsMatt Arsenault2019-09-191-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encode them directly as an imm argument to G_INTRINSIC*. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_* instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
* AMDGPU: Move MnemonicAlias out of instruction def hierarchyMatt Arsenault2019-09-091-20/+5
| | | | | | | | | | | | | | | | | | Unfortunately MnemonicAlias defines a "Predicates" field just like an instruction or pattern, with a somewhat different interpretation. This ends up overriding the intended Predicates set by PredicateControl on the pseudoinstruction defintions with an empty list. This allowed incorrectly selecting instructions that should have been rejected due to the SubtargetPredicate from patterns on the instruction definition. This does remove the divergent predicate from the 64-bit shift patterns, which were already not used for the 32-bit shift, so I'm not sure what the point was. This also removes a second, redundant copy of the 64-bit divergent patterns. llvm-svn: 371427
* AMDGPU/GlobalISel: Select G_ASHRMatt Arsenault2019-07-161-1/+1
| | | | llvm-svn: 366257
* AMDGPU/GlobalISel: Select G_LSHRMatt Arsenault2019-07-161-1/+1
| | | | llvm-svn: 366256
* AMDGPU/GlobalISel: Select G_SHLMatt Arsenault2019-07-161-1/+1
| | | | | | | | | | I think this manages to not break the DAG handling with the divergent predicates because the stadalone divergent patterns end up with a higher priority than the pattern on the instruction definition. The 16-bit versions don't work yet. llvm-svn: 366254
* [AMDGPU] gfx908 mAI instructions, MC partStanislav Mekhanoshin2019-07-091-7/+17
| | | | | | Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563
* AMDGPU/GlobalISel: Select mulMatt Arsenault2019-07-021-1/+1
| | | | llvm-svn: 364932
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-201-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* [AMDGPU] gfx1010 premlane instructionsStanislav Mekhanoshin2019-06-121-0/+28
| | | | | | Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185
* [AMDGPU] Pattern for v_xor3_b32Stanislav Mekhanoshin2019-05-101-1/+4
| | | | | | | | | This also allows three op patterns to use increased constant bus limit of GFX10. Differential Revision: https://reviews.llvm.org/D61763 llvm-svn: 360395
* [AMDGPU] gfx1010 VOP3 and VOP3P implementationStanislav Mekhanoshin2019-04-261-102/+236
| | | | | | Differential Revision: https://reviews.llvm.org/D61202 llvm-svn: 359328
* [AMDGPU] gfx1010 VOP1 instructionsStanislav Mekhanoshin2019-04-251-4/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D61099 llvm-svn: 359225
* [AMDGPU] Sort out and rename multiple CI/VI predicatesStanislav Mekhanoshin2019-04-061-15/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D60346 llvm-svn: 357835
* [AMDGPU] predicate and feature refactoringStanislav Mekhanoshin2019-04-051-22/+29
| | | | | | | | | We have done some predicate and feature refactoring lately but did not upstream it. This is to sync. Differential revision: https://reviews.llvm.org/D60292 llvm-svn: 357791
* [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiersTim Renouf2019-03-181-0/+1
| | | | | | | | | | | | | | | | | This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398
* Revert "AMDGPU/NFC: Cleanup subtarget predicates"Konstantin Zhuravlyov2019-02-221-27/+27
| | | | | | | It breaks one of our downstream merges, so revert it temporarily while investigating failures downstream llvm-svn: 354700
* AMDGPU/NFC: Cleanup subtarget predicatesKonstantin Zhuravlyov2019-02-211-27/+27
| | | | | | Differential Revision: https://reviews.llvm.org/D58522 llvm-svn: 354620
* [AMDGPU] Add intrinsics for 16 bit interpolationTim Corringham2019-01-281-3/+24
| | | | | | | | | | | | | | | | | | | Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [AMDGPU] Add new Mode Register passTim Corringham2018-12-101-8/+28
| | | | | | | | | | | | | | | A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. llvm-svn: 348754
* AMDGPU: Generate VALU ThreeOp Integer instructionsNicolai Haehnle2018-12-061-0/+47
| | | | | | | | | | | | | | | Summary: Original patch by: Fabian Wahlster <razor@singul4rity.com> Change-Id: I148f692a88432541fad468963f58da9ddf79fac5 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, b-sumner, llvm-commits Differential Revision: https://reviews.llvm.org/D51995 llvm-svn: 348488
* AMDGPU: Fix V_FMA_F16 selection on GFX9Konstantin Zhuravlyov2018-11-191-2/+8
| | | | | | | | GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D54545 llvm-svn: 347265
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-2/+2
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* [AMDGPU] Divergence driven instruction selection. Shift operations.Alexander Timofeev2018-10-011-20/+37
| | | | | | | | | | Summary: This change enables VOP3 shifts to be explicitly selected dependent on the divergence. Differential Revision: https://reviews.llvm.org/D52559 Reviewers: rampitec llvm-svn: 343455
* AMDGPU: Fix getInstSizeInBytesNicolai Haehnle2018-08-291-8/+11
| | | | | | | | | | | | | | | | | | | | | | | Summary: Add some optional code to validate getInstSizeInBytes for emitted instructions. This flushed out some issues which are fixed by this patch: - Streamline getInstSizeInBytes - Properly define the VI readlane/writelane instruction as VOP3 - Fix the inline constant determination. Specifically, this change fixes an issue where a 32-bit value of 0xffffffff was recorded as unsigned. This is equal to -1 when restricting to a 32-bit comparison, and an inline constant can be used. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D50629 Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2 llvm-svn: 340903
* AMDGPU: Remove broken i16 ternary patternsJan Vesely2018-08-071-11/+0
| | | | | | | | | | | | | | | Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 llvm-svn: 339190
* [AMDGPU] DAG combine to produce V_PERM_B32Stanislav Mekhanoshin2018-06-121-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559
* AMDGPU: Fix v_dot{4, 8}* instruction encodingKonstantin Zhuravlyov2018-05-151-4/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D46848 llvm-svn: 332387
* [AMDGPU] Fixed some instructions latenciesStanislav Mekhanoshin2018-03-301-5/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D45073 llvm-svn: 328874
* AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classesNicolai Haehnle2018-03-261-3/+3
| | | | | | | Differential revision: https://reviews.llvm.org/D44820 Change-Id: I732979e2964006aa15d78a333d8886e6855f319a llvm-svn: 328496
* [AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9Stanislav Mekhanoshin2018-03-091-13/+12
| | | | | | | | GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D44279 llvm-svn: 327106
* [AMDGPU] isRenamable fixes to support copy forwardingGeoff Berry2018-01-301-2/+0
| | | | | | | | | | | Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will be marked as not renamable, to avoid copy forwarding violating the constraint that only one operand may use the constant bus. These changes fix a few mis-compiles when copy forwarding is enabled in MachineCopyPropagation by D41835 (and were reviewed as part of that change). llvm-svn: 323794
* [AMDGPU][MC][GFX9] Added v_interp_p2_f16 and v_interp_p2_legacy_f16Dmitry Preobrazhensky2017-11-241-2/+18
| | | | | | | | | | See bug 33629: https://bugs.llvm.org//show_bug.cgi?id=33629 Reviewers: artem.tamazov, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D39488 llvm-svn: 318955
* [AMDGPU][MC][GFX8][GFX9] Corrected names of integer ↵Dmitry Preobrazhensky2017-11-201-2/+16
| | | | | | | | | | | | v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675
* AMDGPU: Set correct sched model on v_mad_u64_u32Matt Arsenault2017-11-081-0/+2
| | | | llvm-svn: 317645
* AMDGPU: Remove global isGCN predicatesMatt Arsenault2017-10-031-4/+4
| | | | | | | | | | | | | | These are problematic because they apply to everything, and can easily clobber whatever more specific predicate you are trying to add to a function. Currently instructions use SubtargetPredicate/PredicateControl to apply this to patterns applied to an instruction definition, but not to free standing Pats. Add a wrapper around Pat so the special PredicateControls requirements can be appended to the final predicate list like how Mips does it. llvm-svn: 314742
* [AMDGPU][MC][GFX9] Added op_sel support for v_mad_*16, v_fma_f16, ↵Dmitry Preobrazhensky2017-08-161-66/+85
| | | | | | | | | | | | v_div_fixup_f16 This change implements features postponed in https://reviews.llvm.org/D35424 because of a dependency on https://reviews.llvm.org/D36322 Reviewers: SamWot, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D36694 llvm-svn: 311011
* [AMDGPU][MC][GFX9] Added integer clamping support for VOP3 opcodesDmitry Preobrazhensky2017-08-161-20/+84
| | | | | | | | | | See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152 Reviewers: SamWot, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D36674 llvm-svn: 311006
* [AMDGPU][MC][GFX9] Added 16-bit renamed and "_legacy" VALU opcodesDmitry Preobrazhensky2017-08-091-13/+57
| | | | | | | | | | See Bug 33629: https://bugs.llvm.org//show_bug.cgi?id=33629 Reviewers: vpykhtin, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D36322 llvm-svn: 310497
* [AMDGPU][MC] Corrected VOP3 version of v_interp_* instructions for VIDmitry Preobrazhensky2017-08-071-6/+83
| | | | | | | | | | See bug 32621: https://bugs.llvm.org//show_bug.cgi?id=32621 Reviewers: vpykhtin, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D35902 llvm-svn: 310251
* [AMDGPU][MC][GFX9] Added support of VOP3 'op_sel' modifierDmitry Preobrazhensky2017-07-211-20/+98
| | | | | | | | | | See bug 33591: https://bugs.llvm.org//show_bug.cgi?id=33591 Reviewers: vpykhtin, artem.tamazov, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D35424 llvm-svn: 308740
* [AMDGPU] resubmit r308179: CodeGen: check dst operand type to determine if ↵Sam Kolton2017-07-181-3/+8
| | | | | | omod is supported for VOP3 instructions llvm-svn: 308310
* Revert r308179 which causes tablegen to spam stderr on every build.Chandler Carruth2017-07-181-7/+3
| | | | | | | Original commit log: [AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions llvm-svn: 308270
* [AMDGPU] CodeGen: check dst operand type to determine if omod is supported ↵Sam Kolton2017-07-171-3/+7
| | | | | | | | | | | | | | | | for VOP3 instructions Summary: Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod. Changed .td files to check if dst operand instead of src operand. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D35350 llvm-svn: 308179
* [AMDGPU] Add intrinsics for alignbit and alignbyte instructionsStanislav Mekhanoshin2017-06-091-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D34046 llvm-svn: 305098
OpenPOWER on IntegriCloud