summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/DSInstructions.td
Commit message (Collapse)AuthorAgeFilesLines
* TableGen/GlobalISel: Add way for SDNodeXForm to work on timmMatt Arsenault2020-01-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.
* AMDGPU: Use new PatFrag system for d16 storesMatt Arsenault2020-01-091-2/+2
|
* AMDGPU: Add register class to DS_SWIZZLE_B32 patternMatt Arsenault2020-01-091-1/+1
| | | | Reduces diff for a future patch.
* [AMDGPU] Add missing flags to DS_RealStanislav Mekhanoshin2019-11-051-0/+2
| | | | Differential Revision: https://reviews.llvm.org/D69867
* [AMDGPU] deduplicate tablegen predicatesStanislav Mekhanoshin2019-11-041-1/+1
| | | | | | | | | | | | | | | We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815
* [AMDGPU][MC][GFX9][GFX10] Corrected number of src operands for ↵Dmitry Preobrazhensky2019-10-111-5/+18
| | | | | | | | | | | | ds_[read/write]_addtid_b32 See https://bugs.llvm.org/show_bug.cgi?id=37941 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68787 llvm-svn: 374561
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-1/+1
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* GlobalISel: Don't materialize immarg arguments to intrinsicsMatt Arsenault2019-09-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encode them directly as an imm argument to G_INTRINSIC*. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_* instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
* AMDGPU: Remove code address space predicatesMatt Arsenault2019-09-091-2/+2
| | | | | | | Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due to not be reported as legal. llvm-svn: 371413
* AMDGPU/GlobalISel: Avoid repeating 32-bit type listsMatt Arsenault2019-09-061-1/+1
| | | | llvm-svn: 371156
* AMDGPU/GlobalISel: Fix load/store of types in other address spacesMatt Arsenault2019-09-061-4/+18
| | | | | | There should probably be a size only matcher. llvm-svn: 371155
* AMDGPU/GlobalISel: Select local atomic cmpxchgMatt Arsenault2019-08-011-1/+1
| | | | llvm-svn: 367511
* AMDGPU/GlobalISel: Allow selection of DS atomicrmwMatt Arsenault2019-08-011-1/+1
| | | | llvm-svn: 367507
* AMDGPU: Start redefining atomic PatFragsMatt Arsenault2019-08-011-6/+6
| | | | | | | | Start migrating to a form that will be compatible with the global isel emitter. Also should fix some overly lax checks on the memory type, which allowed mis-selecting some illegal atomics. llvm-svn: 367506
* AMDGPU/GlobalISel: Select simple local storesMatt Arsenault2019-08-011-1/+1
| | | | llvm-svn: 367504
* AMDGPU: Don't use SDNodeXForm for DS offset outputMatt Arsenault2019-07-221-12/+12
| | | | | | | | | | | The xform has no real valuewhen it's using out of a complex pattern output. The complex pattern was already creating TargetConstants with i16, so this was just unnecessary machinery. This allows global isel to import the simple cases once the complex pattern is implemented. llvm-svn: 366743
* AMDGPU: Force s_waitcnt after GWS instructionsMatt Arsenault2019-07-191-1/+4
| | | | | | | This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607
* [AMDGPU] Copy missing predicate from pseudo to realStanislav Mekhanoshin2019-07-151-0/+1
| | | | | | | | NFC at the momemnt, needed for future commit. Differential Revision: https://reviews.llvm.org/D64761 llvm-svn: 366092
* [AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructionsDmitry Preobrazhensky2019-07-151-3/+5
| | | | | | | | | | See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64716 llvm-svn: 366067
* AMDGPU: Split extload/zextload local load patternsMatt Arsenault2019-07-081-3/+6
| | | | | | | This will help removing the custom load predicates, allowing the global isel emitter to handle them. llvm-svn: 365398
* AMDGPU: Support GDS atomicsNicolai Haehnle2019-07-011-42/+46
| | | | | | | | | | | | | | | | | Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814
* AMDGPU: Add intrinsics for DS GWS semaphore instructionsMatt Arsenault2019-06-201-0/+2
| | | | llvm-svn: 363983
* AMDGPU: Insert mem_viol check loop around GWS pre-GFX9Matt Arsenault2019-06-201-1/+1
| | | | | | | It is necessary to emit this loop around GWS operations in case the wave is preempted pre-GFX9. llvm-svn: 363979
* Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"Matt Arsenault2019-06-191-1/+5
| | | | | | | | This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870
* Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsSimon Pilgrim2019-06-191-5/+1
| | | | | | | | | There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797
* AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsMatt Arsenault2019-06-181-1/+5
| | | | | | | There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678
* [AMDGPU] gfx1010 DS implementationStanislav Mekhanoshin2019-05-011-163/+207
| | | | | | Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696
* [AMDGPU] Sort out and rename multiple CI/VI predicatesStanislav Mekhanoshin2019-04-061-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D60346 llvm-svn: 357835
* [AMDGPU] predicate and feature refactoringStanislav Mekhanoshin2019-04-051-10/+10
| | | | | | | | | We have done some predicate and feature refactoring lately but did not upstream it. This is to sync. Differential revision: https://reviews.llvm.org/D60292 llvm-svn: 357791
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-081-34/+17
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* [AMDGPU] Mark ds instructions as meybeAtomicStanislav Mekhanoshin2019-03-011-0/+1
| | | | | | | | | | | | These were not recognized as potential atomics by memory legalizer. The test was working not because legalizer did a right thing, but because it has skipped all these instructions. When I have fixed DS desciption test started to fail because region address has changed from 4 to 2 a while ago. Differential Revision: https://reviews.llvm.org/D58802 llvm-svn: 355179
* Revert "AMDGPU/NFC: Cleanup subtarget predicates"Konstantin Zhuravlyov2019-02-221-7/+7
| | | | | | | It breaks one of our downstream merges, so revert it temporarily while investigating failures downstream llvm-svn: 354700
* AMDGPU/NFC: Cleanup subtarget predicatesKonstantin Zhuravlyov2019-02-211-7/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D58522 llvm-svn: 354620
* AMDGPU: Remove GCN features and predicatesMatt Arsenault2019-02-081-2/+0
| | | | | | | These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Add llvm.amdgcn.ds.ordered.add & swapMarek Olsak2019-01-161-0/+5
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351
* AMDGPU: Avoid selecting ds_{read,write}2_b32 on SINicolai Haehnle2018-10-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698
* AMDGPU: Add patterns for i32/i64 local atomic load/storeMatt Arsenault2018-06-221-0/+21
| | | | | | | | Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325
* AMDGPU: Add D16 instructions preserve unused bits featureKonstantin Zhuravlyov2018-05-041-2/+2
| | | | | | | | | - Predicate D16 patterns on this new feature - Added this new feature to gfx900/2/4 Differential Revision: https://reviews.llvm.org/D46366 llvm-svn: 331551
* [AMDGPU][MC] Added ds_add_src2_f32Dmitry Preobrazhensky2018-03-281-0/+3
| | | | | | | | | See bug 36833: https://bugs.llvm.org/show_bug.cgi?id=36833 Differential Revision: https://reviews.llvm.org/D44779 Reviewers: arsenm, artem.tamazov, timcorringham llvm-svn: 328713
* [AMDGPU] Supported ds_write_b128 generation.Farhana Aleen2018-03-161-0/+2
| | | | | | | | | | | | | | Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726
* [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local ↵Farhana Aleen2018-03-091-0/+1
| | | | | | | | | | | | | | | | | | | | address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153
* AMDGPU: Stop using .NAME in .td filesNicolai Haehnle2018-02-221-6/+6
| | | | | | | | | | | | | | | | | | | Summary: .NAME is a bit of an odd duck, in that we should really treat it like a template argument, but we currently don't, and so when and where NAME is initialized and how is pretty inconsistent. Best to just avoid using it as a field of already instantiated records, and use cast to string instead. Change-Id: I5a0c202401cede3d5c3827ab9c7858ea48b29108 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D43551 llvm-svn: 325794
* [AMDGPU] add LDS f32 intrinsicsDaniil Fukalov2018-01-171-1/+4
| | | | | | | | | | | | added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics to allow generate ds_{add|min|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656
* AMDGPU: Select DS insts without m0 initializationMatt Arsenault2017-11-291-59/+113
| | | | | | | | | GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed. llvm-svn: 319270
* AMDGPU: Add separate definitions for DS insts without m0 useMatt Arsenault2017-11-151-154/+207
| | | | llvm-svn: 318246
* AMDGPU: Select d16 loads into low component of registerMatt Arsenault2017-11-131-0/+18
| | | | llvm-svn: 318005
* AMDGPU: Cleanup local atomic node namesMatt Arsenault2017-10-231-27/+27
| | | | llvm-svn: 316349
* AMDGPU: Remove global isGCN predicatesMatt Arsenault2017-10-031-15/+11
| | | | | | | | | | | | | | These are problematic because they apply to everything, and can easily clobber whatever more specific predicate you are trying to add to a function. Currently instructions use SubtargetPredicate/PredicateControl to apply this to patterns applied to an instruction definition, but not to free standing Pats. Add a wrapper around Pat so the special PredicateControls requirements can be appended to the final predicate list like how Mips does it. llvm-svn: 314742
OpenPOWER on IntegriCloud