summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/VOP2Instructions.td
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Fix import of zext of s16 op patternsMatt Arsenault2020-01-091-3/+3
|
* AMDGPU: Apply i16 add->sub pattern with zext to i32Matt Arsenault2020-01-071-8/+15
| | | | | This was only applying the deeper nested zext pattern, and missing the special case code size fold.
* AMDGPU: Fix misleading, misplaced end block commentsMatt Arsenault2020-01-071-2/+2
|
* AMDGPU/GlobalISel: Select mul24 intrinsicsMatt Arsenault2019-12-301-2/+2
|
* [AMDGPU] deduplicate tablegen predicatesStanislav Mekhanoshin2019-11-041-5/+5
| | | | | | | | | | | | | | | We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815
* [AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32Dmitry Preobrazhensky2019-10-181-52/+77
| | | | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=43608 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69096 llvm-svn: 375241
* [AMDGPU][MC][GFX9] Corrected parsing of v_cndmask_b32_sdwaDmitry Preobrazhensky2019-10-181-1/+1
| | | | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=43607 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69095 llvm-svn: 375231
* [AMDGPU] Supress unused sdwa insts generationStanislav Mekhanoshin2019-10-161-23/+36
| | | | | | | | | Do not generate non-existing sdwa instructions. It reduces the number of generated instructions by 185. Differential Revision: https://reviews.llvm.org/D69010 llvm-svn: 375016
* [AMDGPU] link dpp pseudos and real instructions on gfx10Stanislav Mekhanoshin2019-10-111-12/+17
| | | | | | | | | | | | This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 llvm-svn: 374604
* AMDGPU: Fix i16 arithmetic pattern redundancyMatt Arsenault2019-10-081-78/+23
| | | | | | | | | | | | | | There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092
* [AMDGPU] Disable unused gfx10 dpp instructionsStanislav Mekhanoshin2019-10-081-0/+6
| | | | | | | | | | | Inhibit generation of unused real dpp instructions on gfx10 just like it is done on other subtargets. This does not change anything because these are illegal anyway and not accepted, but it does reduce the number of instruction definitions generated. Differential Revision: https://reviews.llvm.org/D68607 llvm-svn: 374083
* AMDGPU/GlobalISel: Fix selection of 16-bit shiftsMatt Arsenault2019-10-071-3/+6
| | | | llvm-svn: 373945
* AMDGPU/GlobalISel: Fix select for v2s16 and/or/xorMatt Arsenault2019-09-301-15/+17
| | | | llvm-svn: 373180
* [AMDGPU] Allow FP inline constant in v_madak_f16 and v_fmaak_f16Tim Renouf2019-09-181-1/+3
| | | | | | | Differential Revision: https://reviews.llvm.org/D67680 Change-Id: Ic38f47cb2079c2c1070a441b5943854844d80a7c llvm-svn: 372208
* [AMDGPU] Added MI bit IsDOTStanislav Mekhanoshin2019-09-171-1/+2
| | | | | | | | NFC, needed for future commit. Differential Revision: https://reviews.llvm.org/D67669 llvm-svn: 372151
* AMDGPU/GlobalISel: Select 16-bit VALU bit opsMatt Arsenault2019-09-131-3/+3
| | | | llvm-svn: 371807
* AMDGPU/GlobalISel: Select G_CTPOPMatt Arsenault2019-09-131-1/+1
| | | | llvm-svn: 371798
* AMDGPU: Move MnemonicAlias out of instruction def hierarchyMatt Arsenault2019-09-091-3/+3
| | | | | | | | | | | | | | | | | | Unfortunately MnemonicAlias defines a "Predicates" field just like an instruction or pattern, with a somewhat different interpretation. This ends up overriding the intended Predicates set by PredicateControl on the pseudoinstruction defintions with an empty list. This allowed incorrectly selecting instructions that should have been rejected due to the SubtargetPredicate from patterns on the instruction definition. This does remove the divergent predicate from the 64-bit shift patterns, which were already not used for the 32-bit shift, so I'm not sure what the point was. This also removes a second, redundant copy of the 64-bit divergent patterns. llvm-svn: 371427
* AMDGPU/GlobalISel: Select G_ASHRMatt Arsenault2019-07-161-1/+1
| | | | llvm-svn: 366257
* AMDGPU/GlobalISel: Select G_LSHRMatt Arsenault2019-07-161-1/+1
| | | | llvm-svn: 366256
* AMDGPU/GlobalISel: Select G_SHLMatt Arsenault2019-07-161-1/+1
| | | | | | | | | | I think this manages to not break the DAG handling with the divergent predicates because the stadalone divergent patterns end up with a higher priority than the pattern on the instruction definition. The 16-bit versions don't work yet. llvm-svn: 366254
* [AMDGPU] gfx908 dot instruction supportStanislav Mekhanoshin2019-07-111-0/+30
| | | | | | Differential Revision: https://reviews.llvm.org/D64431 llvm-svn: 365715
* [AMDGPU] gfx908 v_pk_fmac_f16 supportStanislav Mekhanoshin2019-07-091-2/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D64433 llvm-svn: 365573
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-201-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* [AMDGPU] Use custom inserter for gfx10 VOP2bStanislav Mekhanoshin2019-06-171-1/+3
| | | | | | | | | This is part of the approved D63204 pending parent revision. This small change is in fact a part of the VOP2b legalization which does not technically belong to wave32 support, so extracted separately. llvm-svn: 363625
* [AMDGPU] gfx1011/gfx1012 targetsStanislav Mekhanoshin2019-06-141-0/+54
| | | | | | Differential Revision: https://reviews.llvm.org/D63307 llvm-svn: 363344
* [AMDGPU] gfx1010 base changes for wave32Stanislav Mekhanoshin2019-06-131-0/+35
| | | | | | Differential Revision: https://reviews.llvm.org/D63293 llvm-svn: 363299
* [AMDGPU] gfx1010 dpp16 and dpp8Stanislav Mekhanoshin2019-06-121-4/+102
| | | | | | Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186
* [AMDGPU][GFX8][GFX9] Corrected predicate of v_*_co_u32 aliasesDmitry Preobrazhensky2019-05-141-1/+6
| | | | | | | | Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D61905 llvm-svn: 360702
* AMDGPU: Select VOP3 form of addMatt Arsenault2019-05-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The VOP3 form should always be the preferred selection, to be shrunk later. This should only be an optimization issue, but this partially works around a problem from clobbering VCC when SIFixSGPRCopies rewrites an SCC defining operation directly to VCC. 3 of the testcases are regressions from failing to fold the immediate in cases it should. These can be avoided by improving the VCC liveness handling in SIFoldOperands. Simply increasing the threshold to computeRegisterLiveness works, although this is common enough that VCC liveness should probably be tracked throughout the pass. The hack of leaving behind an implicit_def instruction to avoid breaking iterator wastes instruction count, which inhibits finding the VCC def in long chains of adds. Doing this however exposes different, worse looking regressions from poor scheduling behavior. This could probably be avoided around by forcing the shrink of the addc here, but the scheduler should probably be fixed. The r600 add test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 360293
* AMDGPU: Fix a mis-placed bracketChangpeng Fang2019-05-081-1/+1
| | | | | | | Differential Revision: https://reviews.llvm.org/D61430 llvm-svn: 360283
* AMDGPU: Select VOP3 form of subMatt Arsenault2019-05-031-2/+2
| | | | | | | | | | The VOP3 form should always be the preferred selection form to be shrunk later. The r600 sub test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 359899
* AMDGPU: Remove redundant patterns for shiftsMatt Arsenault2019-05-031-9/+4
| | | | llvm-svn: 359895
* AMDGPU: Remove redundant patterns for subMatt Arsenault2019-05-031-4/+0
| | | | | | | There were 2 patterns for sub, one selecting to sub and one to subrev. Only one of these will succeed, so remove the reversed one. llvm-svn: 359894
* [AMDGPU] gfx1010 VOPC implementationStanislav Mekhanoshin2019-04-261-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D61208 llvm-svn: 359358
* [AMDGPU] gfx1010 VOP2 changesStanislav Mekhanoshin2019-04-261-125/+419
| | | | | | Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316
* [AMDGPU] gfx1010 VOP1 instructionsStanislav Mekhanoshin2019-04-251-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D61099 llvm-svn: 359225
* [AMDGPU] Sort out and rename multiple CI/VI predicatesStanislav Mekhanoshin2019-04-061-9/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D60346 llvm-svn: 357835
* [AMDGPU] predicate and feature refactoringStanislav Mekhanoshin2019-04-051-9/+6
| | | | | | | | | We have done some predicate and feature refactoring lately but did not upstream it. This is to sync. Differential revision: https://reviews.llvm.org/D60292 llvm-svn: 357791
* [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmeticTim Renouf2019-03-181-10/+19
| | | | | | | | | | | | | | Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399
* [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiersTim Renouf2019-03-181-8/+13
| | | | | | | | | | | | | | | | | This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398
* [AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, ↵Dmitry Preobrazhensky2019-03-041-2/+2
| | | | | | | | | | | | v_readlane_b32 and v_writelane_b32 See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58713 llvm-svn: 355312
* Revert "AMDGPU/NFC: Cleanup subtarget predicates"Konstantin Zhuravlyov2019-02-221-11/+11
| | | | | | | It breaks one of our downstream merges, so revert it temporarily while investigating failures downstream llvm-svn: 354700
* AMDGPU/NFC: Cleanup subtarget predicatesKonstantin Zhuravlyov2019-02-211-11/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D58522 llvm-svn: 354620
* AMDGPU: Remove GCN features and predicatesMatt Arsenault2019-02-081-4/+0
| | | | | | | These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [AMDGPU] Add new Mode Register passTim Corringham2018-12-101-1/+6
| | | | | | | | | | | | | | | A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. llvm-svn: 348754
* [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)Valery Pykhtin2018-11-301-23/+46
| | | | | | | | Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 llvm-svn: 347993
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-4/+4
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* AMDGPU: Split HasExt into HasExtDPP/SDWA/SDWA9Konstantin Zhuravlyov2018-09-271-6/+19
| | | | llvm-svn: 343264
OpenPOWER on IntegriCloud