summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstructions.td
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Set hasSideEffects 0 on _term instructionsMatt Arsenault2019-03-251-0/+3
| | | | | | | | These were defaulting to true, but they are just wrappers around bit operations. This avoids regressions in the exec mask optimization passes in a future commit. llvm-svn: 356952
* [AMDGPU] Added v5i32 and v5f32 register classesTim Renouf2019-03-221-0/+22
| | | | | | | | | | They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-211-0/+21
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiersTim Renouf2019-03-181-13/+32
| | | | | | | | | | | | | | | | | This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398
* Revert "AMDGPU/NFC: Cleanup subtarget predicates"Konstantin Zhuravlyov2019-02-221-2/+2
| | | | | | | It breaks one of our downstream merges, so revert it temporarily while investigating failures downstream llvm-svn: 354700
* AMDGPU/NFC: Cleanup subtarget predicatesKonstantin Zhuravlyov2019-02-211-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D58522 llvm-svn: 354620
* AMDGPU: Remove GCN features and predicatesMatt Arsenault2019-02-081-17/+3
| | | | | | | These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
* [AMDGPU] Support emitting GOT relocations for function callsScott Linder2019-02-041-15/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D57416 llvm-svn: 353083
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Add a fast path for icmp.i1(src, false, NE)Marek Olsak2019-01-151-0/+5
| | | | | | | | | | | | | | | | | Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150
* AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcastsRhys Perry2018-12-191-0/+2
| | | | | | | | | | | | Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55058 llvm-svn: 349694
* [AMDGPU] Restored selection of scalar_to_vector (v2x16)Stanislav Mekhanoshin2018-11-191-9/+9
| | | | | | | | | This works if DAG combiner is enabled, but without combining we cannot select scalar_to_vector of <2 x half> and <2 x i16>. Differential Revision: https://reviews.llvm.org/D54718 llvm-svn: 347259
* AMDGPU: Fix analyzeBranch failing with pseudoterminatorsMatt Arsenault2018-11-161-1/+2
| | | | | | | | | If a block had one of the _term instructions used for gluing exec modifying instructions to the end of the block, analyzeBranch would fail, preventing the verifier from catching a broken successor list. llvm-svn: 347027
* AMDGPU: Additional pattern for i16 median3 matchingAakanksha Patil2018-11-141-4/+17
| | | | | | | | min(max(a, b), max(min(a, b), c)) Differential Revision: https://reviews.llvm.org/D54494 llvm-svn: 346886
* AMDGPU: Adding more median3 patternsAakanksha Patil2018-11-121-3/+4
| | | | | | | | min(max(a, b), max(min(a, b), c)) -> med3 a, b, c Differential Revision: https://reviews.llvm.org/D54331 llvm-svn: 346704
* AMDGPU: Remove PHI loop condition optimizationNicolai Haehnle2018-10-311-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-6/+7
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* AMDGPU: Add support pattern for SUB of one bitChangpeng Fang2018-10-191-0/+10
| | | | | | | | | | | | | Summary: Add selection patterns to support one bit Sub. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D52946 llvm-svn: 344815
* AMDGPU: Add Selection patterns to support add of one bit.Changpeng Fang2018-09-251-0/+12
| | | | | | | | | | | | | | Summary: We generate s_xor to lower add of i1s in general cases, and s_not to lower add with a one-bit imm of -1 (true). Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D52518 llvm-svn: 343030
* [AMDGPU] Divergence driven instruction selection. Part 1.Alexander Timofeev2018-09-211-3/+5
| | | | | | | | | | | | | Summary: This change is the first part of the AMDGPU target description change. The aim of it is the effective splitting the vector and scalar flows at the selection stage. Selection uses predicate functions based on the framework implemented earlier - https://reviews.llvm.org/D35267 Differential revision: https://reviews.llvm.org/D52019 Reviewers: rampitec llvm-svn: 342719
* [AMDGPU] Add instruction selection for i1 to f16 conversionCarl Ritson2018-09-191-0/+10
| | | | | | | | | | | | | | | | | | Summary: This is required for GPUs with 16 bit instructions where f16 is a legal register type and hence int_to_fp i1 to f16 is not lowered by legalizing. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52018 Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851 llvm-svn: 342558
* AMDGPU: Fix packing undef parts of build_vectorMatt Arsenault2018-08-121-2/+21
| | | | llvm-svn: 339511
* AMDGPU: Use SPseudoInst helperMatt Arsenault2018-08-011-8/+5
| | | | llvm-svn: 338631
* AMDGPU: Reduce code size with fcanonicalize (fneg x)Matt Arsenault2018-07-301-0/+10
| | | | | | | | When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244
* AMDGPU: Fix code size for return_to_epilog pseudoMatt Arsenault2018-07-271-0/+1
| | | | llvm-svn: 338113
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-2/+1
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16Stanislav Mekhanoshin2018-06-281-5/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D48677 llvm-svn: 335866
* AMDGPU: Add implicit def of SCC to kill and indirect pseudosNicolai Haehnle2018-06-211-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Kill instructions sometimes do use SCC in unusual circumstances, when v_cmpx cannot be used due to the operands that are involved. Additionally, even if SCC was never defined by the expansion, kill pseudos could previously occur between an s_cmp and an s_cbranch_scc, which breaks the SCC liveness tracking when the pseudo is expanded to split the basic block. While it would be possible to explicitly mark the SCC as live-in for the successor basic block, it's simpler to just mark the pseudo as using SCC, so that such a sequence is never emitted by instruction selection in the first place. A similar issue affects indirect source/dest pseudos in principle, although I haven't been able to come up with a test case where it actually matters (this affects instruction selection, so a MIR test can't be used). Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47761 llvm-svn: 335223
* AMDGPU: Fix scalar_to_vector for v4i16/v4f16Matt Arsenault2018-06-201-0/+10
| | | | llvm-svn: 335161
* AMDGPU: Make v4i16/v4f16 legalMatt Arsenault2018-06-151-0/+41
| | | | | | | Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
* AMDGPU: Use scalar operations for f16 fabs/fneg patternsMatt Arsenault2018-06-071-7/+7
| | | | | | Fixes unnecessary differences between subtargets. llvm-svn: 334184
* AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16Matt Arsenault2018-06-061-0/+5
| | | | | | | | | | | | Fixes terrible code on targets without f16 support. The legalization creates a mess that is difficult to recover from. Also should avoid randomly breaking these tests multiple times in sequence in future commits. Some regressions in cases where it happens to be better to pull the source modifier after the conversion. llvm-svn: 334132
* AMDGPU: Fix v2f16 fneg/fabs patternMatt Arsenault2018-05-221-0/+5
| | | | | | | | | | The integer operation convertion for some reason only happens if the source is a bitcast from an integer, which happens to always be the situation when the result is loaded. Add an additional pattern for when the source operation is really an FP operation. llvm-svn: 333019
* AMDGPU: Make v2i16/v2f16 legal on VIMatt Arsenault2018-05-221-5/+15
| | | | | | | | | | | | This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953
* AMDGPU: Add Vega12 and Vega20Matt Arsenault2018-04-301-0/+10
| | | | | | | | Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331215
* AMDGPU: Consolidate SubtargetPredicate definitionsMatt Arsenault2018-04-261-7/+0
| | | | llvm-svn: 330979
* AMDGPU: Remove deprecated llvm.AMDGPU.kilp intrinsicTom Stellard2018-04-241-5/+0
| | | | | | | | | | | | | | Summary: This is no longer used by mesa since its 18.0.0 release. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D45988 llvm-svn: 330775
* [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP ↵Dmitry Preobrazhensky2018-03-161-6/+9
| | | | | | | | | | | opcodes See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751 Differential Revision: https://reviews.llvm.org/D44529 Reviewers: artem.tamazov, arsenm llvm-svn: 327723
* AMDGPU/GCN: Promote i16 ctpopJan Vesely2018-03-021-0/+4
| | | | | | | | | i16 capable ASICs do not support i16 operands for this instruction. Add tablegen pattern to merge chained i16 additions. Differential Revision: https://reviews.llvm.org/D43985 llvm-svn: 326535
* AMDGPU: Remove tied operand from si_elseMatt Arsenault2018-02-091-1/+0
| | | | llvm-svn: 324751
* AMDGPU: Select BFI patterns with 64-bit intsMatt Arsenault2018-02-071-1/+2
| | | | llvm-svn: 324431
* AMDGPU: Fix missing SCC def from s_xor_b64_termMatt Arsenault2018-01-311-0/+1
| | | | llvm-svn: 323927
* AMDGPU: Use gfx9 carry-less add/sub instructionsMatt Arsenault2017-11-301-4/+8
| | | | llvm-svn: 319491
* [AMDGPU][MC][GFX8][GFX9] Corrected names of integer ↵Dmitry Preobrazhensky2017-11-201-44/+0
| | | | | | | | | | | | v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675
* AMDGPU: Replace i64 add/sub loweringMatt Arsenault2017-11-151-2/+20
| | | | | | | | | | | | | | | Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. llvm-svn: 318340
* AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)Marek Olsak2017-10-241-10/+40
| | | | | | | | | | | | | | | | Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427
* AMDGPU: Fix incorrect selection of pseudo-branchesMatt Arsenault2017-10-101-0/+2
| | | | | | These should only be used if the machine structurizer is enabled. llvm-svn: 315357
* AMDGPU: Remove global isGCN predicatesMatt Arsenault2017-10-031-147/+154
| | | | | | | | | | | | | | These are problematic because they apply to everything, and can easily clobber whatever more specific predicate you are trying to add to a function. Currently instructions use SubtargetPredicate/PredicateControl to apply this to patterns applied to an instruction definition, but not to free standing Pats. Add a wrapper around Pat so the special PredicateControls requirements can be appended to the final predicate list like how Mips does it. llvm-svn: 314742
* [AMDGPU] Use v_pk_max_f16 for fcanonicalizeStanislav Mekhanoshin2017-09-061-5/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676
* [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalizeStanislav Mekhanoshin2017-09-061-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660
OpenPOWER on IntegriCloud