summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Model distance to instruction in bundleStanislav Mekhanoshin2020-01-141-5/+17
| | | | | | | This change allows to model the height of the instruction within a bundle for latency adjustment purposes. Differential Revision: https://reviews.llvm.org/D72669
* Let targets adjust operand latency of bundlesStanislav Mekhanoshin2020-01-101-40/+25
| | | | | | | | | | | This reverts the AMDGPU DAG mutation implemented in D72487 and gives a more general way of adjusting BUNDLE operand latency. It also replaces FixBundleLatencyMutation with adjustSchedDependency callback in the AMDGPU, fixing not only successor latencies but predecessors' as well. Differential Revision: https://reviews.llvm.org/D72535
* AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELTMatt Arsenault2020-01-091-0/+9
| | | | | Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.
* [AMDGPU] Fix bundle schedulingStanislav Mekhanoshin2020-01-091-0/+40
| | | | | | | Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487
* AMDGPU: Switch backend default max workgroup size to 1024Matt Arsenault2019-11-131-7/+1
| | | | | | | | | | | | | Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.
* [AMDGPU] Fix mfma scheduling crashStanislav Mekhanoshin2019-10-241-1/+6
| | | | | | | An SUnit can be neither intruction not SDNode. It is all null if represents a nop. Fixed a crash on using SU->getInstr(). Differential Revision: https://reviews.llvm.org/D69395
* [Alignment] Migrate Attribute::getWith(Stack)AlignmentGuillaume Chatelet2019-10-151-7/+7
| | | | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Reviewed By: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68792 llvm-svn: 374884
* [AMDGPU] fixed underflow in getOccupancyWithNumVGPRsStanislav Mekhanoshin2019-09-191-1/+1
| | | | | | | | | The function could return zero if an extreme number or registers were used. Minimal possible occupancy is 1. Differential Revision: https://reviews.llvm.org/D67771 llvm-svn: 372350
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-1/+1
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.storeMatt Arsenault2019-09-191-1/+1
| | | | llvm-svn: 372292
* [AMDGPU] w/a for gfx908 mfma SrcC literal HW bugStanislav Mekhanoshin2019-08-231-0/+1
| | | | | | | | gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369816
* [llvm] Migrate llvm::make_unique to std::make_uniqueJonas Devlieghere2019-08-151-2/+2
| | | | | | | | Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013
* [AMDGPU] Fix high occupancy calculation and print itStanislav Mekhanoshin2019-07-311-1/+17
| | | | | | | | | | | We had couple places which still return 10 as a maximum occupancy. Fixed. Also print comment about occupancy as compiler see it. Differential Revision: https://reviews.llvm.org/D65423 llvm-svn: 367381
* [AMDGPU] Fixed occupancy calculation for gfx10Stanislav Mekhanoshin2019-07-191-19/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616
* [AMDGPU] fixed scheduler crash in gfx908Stanislav Mekhanoshin2019-07-151-2/+2
| | | | | | | | | For some reason scheduler can send down an SUnit without an instruction. Differential Revision: https://reviews.llvm.org/D64709 llvm-svn: 366074
* [AMDGPU] gfx908 schedulingStanislav Mekhanoshin2019-07-111-0/+124
| | | | | | Differential Revision: https://reviews.llvm.org/D64590 llvm-svn: 365826
* [AMDGPU] gfx908 targetStanislav Mekhanoshin2019-07-091-0/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D64429 llvm-svn: 365525
* [AMDGPU] Fix for branch offset hardware workaroundRyan Taylor2019-06-261-0/+1
| | | | | | | | | | | | | | | | | | | Summary: This fixes a hardware bug that makes a branch offset of 0x3f unsafe. This replaces the 32 bit branch with offset 0x3f to a 64 bit instruction that includes the same 32 bit branch and the encoding for a s_nop 0 to follow. The relaxer than modifies the offsets accordingly. Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63494 llvm-svn: 364451
* [AMDGPU] gfx1011/gfx1012 targetsStanislav Mekhanoshin2019-06-141-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D63307 llvm-svn: 363344
* [AMDGPU] gfx1010 base changes for wave32Stanislav Mekhanoshin2019-06-131-0/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D63293 llvm-svn: 363299
* [AMDGPU] gfx1010 dpp16 and dpp8Stanislav Mekhanoshin2019-06-121-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186
* AMDGPU: Remove amdgpu-max-work-group-size attributeMatt Arsenault2019-06-051-10/+1
| | | | | | | This has been deprecated for a long time, and mesa recently switched to amdgpu-flat-work-group-size. llvm-svn: 362641
* AMDGPU: Correct maximum possible private allocation sizeMatt Arsenault2019-05-231-1/+0
| | | | | | | | | | | | | | | | We were assuming a much larger possible per-wave visible stack allocation than is possible: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70 Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541
* AMDGPU: Assume xnack is enabled by defaultMatt Arsenault2019-05-161-1/+7
| | | | | | | | | | | This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903
* [AMDGPU] gfx1010 constant bus limitStanislav Mekhanoshin2019-05-021-0/+20
| | | | | | | | Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754
* [AMDGPU] Add gfx1010 target definitionsStanislav Mekhanoshin2019-04-241-1/+35
| | | | | | Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113
* [AMDGPU] predicate and feature refactoringStanislav Mekhanoshin2019-04-051-1/+2
| | | | | | | | | We have done some predicate and feature refactoring lately but did not upstream it. This is to sync. Differential revision: https://reviews.llvm.org/D60292 llvm-svn: 357791
* AMDGPU: Assume ECC is enabled by default if supportedMatt Arsenault2019-04-031-2/+12
| | | | | | | | | | The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558
* AMDGPU: Remove dx10-clamp from subtarget featuresMatt Arsenault2019-03-291-4/+2
| | | | | | | | | | | | | | | | | | Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302
* [ScheduleDAG] Move `Topo` and `addEdge` to base class.Clement Courbet2019-03-291-3/+1
| | | | | | | | | Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`. There is nothing actually specific to `ScheduleDAGMI` in `Topo`. llvm-svn: 357239
* AMDGPU: Partially fix default device for HSAMatt Arsenault2019-03-171-2/+2
| | | | | | | | | | | | | | | | | | There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347
* AMDGPU: Remove debugger related subtarget featuresMatt Arsenault2019-02-211-2/+0
| | | | | | As far as I know these aren't needed anymore. llvm-svn: 354634
* [AMDGPU] Split dot-insts featureStanislav Mekhanoshin2019-02-091-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D57971 llvm-svn: 353587
* AMDGPU: Eliminate GPU specific SubtargetFeaturesMatt Arsenault2019-02-081-1/+0
| | | | | | | | | | | Inline compatability is determined from the individual feature bits. These are just sets of the separate features, but will always be treated as incompatible unless they are specifically ignored. Defining the ISA version number here in tablegen would be nice, but it turns out this wasn't actually used. llvm-svn: 353558
* AMDGPU: Remove GCN features and predicatesMatt Arsenault2019-02-081-0/+4
| | | | | | | These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd tryDavid Stuttard2019-01-141-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
* [AMDGPU] Separate feature dot-instsStanislav Mekhanoshin2019-01-101-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D56524 llvm-svn: 350793
* Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"David Stuttard2018-11-291-6/+0
| | | | | | | | | Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911
* Add support for TFE/LWE in image intrinsicsDavid Stuttard2018-11-291-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
* AMDGPU: Add sram-ecc featureKonstantin Zhuravlyov2018-11-051-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177
* [AMDGPU] Remove FeatureVGPRSpillingScott Linder2018-10-311-5/+0
| | | | | | | | | | | This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763
* [AMDGPU] Initialize instruction itinerary from GCNSubtargetStanislav Mekhanoshin2018-09-171-0/+1
| | | | | | | | I need to use it in the GCN codegen. Differential Revision: https://reviews.llvm.org/D52123 llvm-svn: 342400
* [AMDGPU] Ensure trig range reduction only used for subtargets that require itDavid Stuttard2018-09-141-0/+1
| | | | | | | | | | | | | | | | | | Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222
* AMDGPU: Re-apply r341982 after fixing the layering issueKonstantin Zhuravlyov2018-09-121-6/+4
| | | | | | | | | | | | Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069
* Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into ↵Ilya Biryukov2018-09-121-4/+6
| | | | | | | | | | | TargetParser." This reverts commit r341982. The change introduced a layering violation. Reverting to unbreak our integrate. llvm-svn: 342023
* AMDGPU: Move isa version and EF_AMDGPU_MACH_* determinationKonstantin Zhuravlyov2018-09-111-6/+4
| | | | | | | | | | | | | | into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). Differential Revision: https://reviews.llvm.org/D51890 llvm-svn: 341982
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-3/+1
| | | | llvm-svn: 341165
* [AMDGPU] Add support for a16 modifiear for gfx9Ryan Taylor2018-08-281-0/+1
| | | | | | | | | | | | | Summary: Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9. Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841 Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50575 llvm-svn: 340831
OpenPOWER on IntegriCloud