summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Fix import of integer med3Matt Arsenault2020-01-091-24/+0
| | | | | This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.
* AMDGPU: Eliminate more legacy codepred address space PatFragsMatt Arsenault2020-01-091-84/+0
| | | | These should now be limited to R600 code.
* AMDGPU: Use new PatFrag system for d16 storesMatt Arsenault2020-01-091-13/+7
|
* AMDGPU: Annotate EXTRACT_SUBREGs with source register classesMatt Arsenault2020-01-071-24/+24
| | | | | This partially fixes GlobalISel import of the patterns, but removes a lot of entriess from the end of the skipped pattern log.
* AMDGPU: Use ImmLeafMatt Arsenault2020-01-071-2/+2
|
* AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTORMatt Arsenault2020-01-061-6/+16
|
* AMDGPU: Refactor treatment of denormal modeMatt Arsenault2019-11-191-6/+9
| | | | | | | | | | | Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
* [AMDGPU] deduplicate tablegen predicatesStanislav Mekhanoshin2019-11-041-6/+14
| | | | | | | | | | | | | | | We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815
* AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHGMatt Arsenault2019-10-251-14/+1
| | | | | | | | Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.
* AMDGPU/GlobalISel: Select local atomic cmpxchgMatt Arsenault2019-08-011-16/+12
| | | | llvm-svn: 367511
* AMDGPU: Start redefining atomic PatFragsMatt Arsenault2019-08-011-38/+32
| | | | | | | | Start migrating to a form that will be compatible with the global isel emitter. Also should fix some overly lax checks on the memory type, which allowed mis-selecting some illegal atomics. llvm-svn: 367506
* AMDGPU/GlobalISel: Select simple local storesMatt Arsenault2019-08-011-1/+3
| | | | llvm-svn: 367504
* AMDGPU/GlobalISel: Select local loadsMatt Arsenault2019-08-011-0/+2
| | | | llvm-svn: 367498
* TableGen: Add MinAlignment predicateMatt Arsenault2019-07-311-19/+21
| | | | | | | | | | AMDGPU uses some custom code predicates for testing alignments. I'm still having trouble comprehending the behavior of predicate bits in the PatFrag hierarchy. Any attempt to abstract these properties unexpectdly fails to apply them. llvm-svn: 367373
* AMDGPU: Avoid emitting "true" predicatesMatt Arsenault2019-07-301-1/+1
| | | | | | | | | | Empty condition strings are considerde always true. This removes a lot of clutter from the generated matcher tables. This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to 6.1M. llvm-svn: 367326
* AMDGPU: Redefine setcc condition PatLeafsMatt Arsenault2019-07-191-55/+23
| | | | | | Avoid using custom code predicates. llvm-svn: 366609
* AMDGPU: Replace store PatFragsMatt Arsenault2019-07-161-12/+32
| | | | | | Convert the easy cases to formats understood for GlobalISel. llvm-svn: 366240
* AMDGPU: Redefine load PatFragsMatt Arsenault2019-07-161-70/+97
| | | | | | | Rewrite PatFrags using the new PatFrag address space matching in tablegen. These will now work with both SelectionDAG and GlobalISel. llvm-svn: 366234
* AMDGPU: Avoid code predicates for extload PatFragsMatt Arsenault2019-07-161-29/+16
| | | | | | | | | | Use the MemoryVT field. This will be necessary for tablegen to automatically handle patterns for GlobalISel. Doesn't handle the d16 lo/hi patterns. Those are a special case since it involvess the custom node type. llvm-svn: 366168
* [AMDGPU] gfx908 atomic fadd and atomic pk_faddStanislav Mekhanoshin2019-07-111-4/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717
* AMDGPU: Split extload/zextload local load patternsMatt Arsenault2019-07-081-2/+4
| | | | | | | This will help removing the custom load predicates, allowing the global isel emitter to handle them. llvm-svn: 365398
* AMDGPU: Support GDS atomicsNicolai Haehnle2019-07-011-0/+21
| | | | | | | | | | | | | | | | | Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814
* [AMDGPU] gfx1010 base changes for wave32Stanislav Mekhanoshin2019-06-131-1/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D63293 llvm-svn: 363299
* [AMDGPU] Remove now unused V2FP16_ONE constant def. NFC.Stanislav Mekhanoshin2019-05-131-1/+0
| | | | llvm-svn: 360608
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-081-3/+3
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* AMDGPU: Remove GCN features and predicatesMatt Arsenault2019-02-081-11/+3
| | | | | | | These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Raise the priority of MAD24 in instruction selection.Changpeng Fang2019-01-151-0/+2
| | | | | | | | | | | | | | | | | Summary: We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly estimate register pressure during instruction selection, we can give mad a higher priority. In this work, we raise the priority for mad24 in selection and resolve the performance regression. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D56745 llvm-svn: 351273
* [AMDGPU] Add and update scalar instructionsGraham Sellers2018-11-291-0/+8
| | | | | | | | | This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit. A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly). Differential: https://reviews.llvm.org/D54714 llvm-svn: 347877
* AMDGPU: Adding more median3 patternsAakanksha Patil2018-11-121-6/+18
| | | | | | | | min(max(a, b), max(min(a, b), c)) -> med3 a, b, c Differential Revision: https://reviews.llvm.org/D54331 llvm-svn: 346704
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-0/+28
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-18/+18
| | | | llvm-svn: 341165
* [AMDGPU] Support idot2 pattern.Farhana Aleen2018-08-211-0/+3
| | | | | | | | | | | | | | | | Summary: Transform add (mul ((i32)S0.x, (i32)S1.x), add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3) Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D50024 llvm-svn: 340295
* AMDGPU: Reduce code size with fcanonicalize (fneg x)Matt Arsenault2018-07-301-0/+1
| | | | | | | | When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244
* AMDGPU/R600: Add MOV instructions to BFE patternsJan Vesely2018-07-271-5/+5
| | | | | | | | | R600 can't handle immediates for BFE, these will be eliminated later. Fixes powr/pow regressions n r600 since r334817 Differential Revision: https://reviews.llvm.org/D49641 llvm-svn: 338127
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-16/+72
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* AMDGPU: Add patterns for i32/i64 local atomic load/storeMatt Arsenault2018-06-221-0/+3
| | | | | | | | Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325
* [AMDGPU] Recognize x & ~(-1 << y) pattern.Roman Lebedev2018-06-151-0/+6
| | | | | | | | | | | | | | | | Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 llvm-svn: 334817
* [AMDGPU] Recognize x & ((1 << y) - 1) pattern.Roman Lebedev2018-06-151-0/+7
| | | | | | | | | | | | | | | | | | | | | Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 llvm-svn: 334816
* [AMDGPU] Recognize x & (-1 >> (32 - y)) pattern.Roman Lebedev2018-06-151-0/+7
| | | | | | | | | | | | | | | | | | | | | Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 llvm-svn: 334815
* [AMDGPU] Supported ds_write_b128 generation.Farhana Aleen2018-03-161-0/+3
| | | | | | | | | | | | | | Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726
* [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local ↵Farhana Aleen2018-03-091-0/+8
| | | | | | | | | | | | | | | | | | | | address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153
* AMDGPU: Select BFI patterns with 64-bit intsMatt Arsenault2018-02-071-4/+43
| | | | llvm-svn: 324431
* AMDGPU: Move ADDRIndirect complex pattern into R600Instructions.tdTom Stellard2018-01-291-1/+0
| | | | | | | | | | | | | | Summary: This is only used by R600. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37114 llvm-svn: 323709
* AMDGPU/EG: Add a new FeatureFMA and use it to selectively enable FMA instructionJan Vesely2017-12-041-0/+1
| | | | | | | | | Only used by pre-GCN targets v2: fix predicate setting for FMA_Common Differential Revision: https://reviews.llvm.org/D40692 llvm-svn: 319712
* AMDGPU: Select d16 loads into low component of registerMatt Arsenault2017-11-131-0/+23
| | | | llvm-svn: 318005
* AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)Marek Olsak2017-10-241-1/+0
| | | | | | | | | | | | | | | | Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427
* AMDGPU: Cleanup local atomic node namesMatt Arsenault2017-10-231-16/+4
| | | | llvm-svn: 316349
* AMDGPU: Remove global isGCN predicatesMatt Arsenault2017-10-031-23/+27
| | | | | | | | | | | | | | These are problematic because they apply to everything, and can easily clobber whatever more specific predicate you are trying to add to a function. Currently instructions use SubtargetPredicate/PredicateControl to apply this to patterns applied to an instruction definition, but not to free standing Pats. Add a wrapper around Pat so the special PredicateControls requirements can be appended to the final predicate list like how Mips does it. llvm-svn: 314742
* AMDGPU: Move r600 only code into r600 only td fileMatt Arsenault2017-09-201-53/+0
| | | | llvm-svn: 313719
OpenPOWER on IntegriCloud