summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Refactor treatment of denormal modeMatt Arsenault2019-11-191-7/+0
| | | | | | | | | | | Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
* AMDGPU: Add default denormal mode to MachineFunctionInfoMatt Arsenault2019-11-011-2/+10
| | | | | | The default FP mode should really be a property of a specific function, and not a subtarget. Introduce the necessary fields to the SIMachineFunctionInfo to help move towards this goal.
* AMDGPU: Add amdgpu-32bit-address-high-bits to MIR serializationMatt Arsenault2019-08-271-1/+4
| | | | llvm-svn: 370089
* [llvm] Migrate llvm::make_unique to std::make_uniqueJonas Devlieghere2019-08-151-3/+3
| | | | | | | | Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013
* [AMDGPU] gfx908 agpr spillingStanislav Mekhanoshin2019-07-111-1/+29
| | | | | | Differential Revision: https://reviews.llvm.org/D64594 llvm-svn: 365833
* AMDGPU: Serialize mode from MachineFunctionInfoMatt Arsenault2019-07-101-0/+27
| | | | llvm-svn: 365653
* AMDGPU: Make s34 the FP registerMatt Arsenault2019-07-081-0/+8
| | | | | | | | | | | | | | | | | | | | | | | Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372
* [AMDGPU] Enable serializing of argument info.Michael Liao2019-07-031-0/+121
| | | | | | | | | | | | | | | | Summary: - Support serialization of all arguments in machine function info. This enables fabricating MIR tests depending on argument info. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64096 llvm-svn: 364995
* AMDGPU: Support GDS atomicsNicolai Haehnle2019-07-011-0/+5
| | | | | | | | | | | | | | | | | Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814
* AMDGPU: Convert some places to RegisterMatt Arsenault2019-07-011-2/+2
| | | | llvm-svn: 364769
* [AMDGPU] Removed dead SIMachineFunctionInfo::getWorkItemIDVGPR()Stanislav Mekhanoshin2019-06-251-3/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D63780 llvm-svn: 364339
* Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"Matt Arsenault2019-06-191-1/+36
| | | | | | | | This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870
* Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsSimon Pilgrim2019-06-191-36/+1
| | | | | | | | | There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797
* AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsMatt Arsenault2019-06-181-1/+36
| | | | | | | There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678
* AMDGPU: Cleanup custom PseudoSourceValue definitionsMatt Arsenault2019-06-171-16/+23
| | | | | | | Use separate enums for each kind, avoid repeating overloads, and add missing classof implementation. llvm-svn: 363558
* AMDGPU: Invert frame index offset interpretationMatt Arsenault2019-06-051-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
* [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.Neil Henning2019-04-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400
* AMDGPU: Remove dx10-clamp from subtarget featuresMatt Arsenault2019-03-291-0/+7
| | | | | | | | | | | | | | | | | | Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302
* MIR: Allow targets to serialize MachineFunctionInfoMatt Arsenault2019-03-141-1/+53
| | | | | | | | | | | | | | | | | | This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. llvm-svn: 356215
* AMDGPU: Remove debugger related subtarget featuresMatt Arsenault2019-02-211-30/+0
| | | | | | As far as I know these aren't needed anymore. llvm-svn: 354634
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/ZKonstantin Zhuravlyov2018-06-211-15/+0
| | | | | | | | | | | | and everything that comes with it from implementation and v3 header files. Leave definition in v2 header files for backwards compatibility. Differential Revision: https://reviews.llvm.org/D48191 llvm-svn: 335267
* [AMDGPU] Track occupancy in MFIStanislav Mekhanoshin2018-05-311-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep track of achieved occupancy in SIMachineFunctionInfo. At the moment we have a lot of duplicated or even missed code to query and maintain occupancy info. Record it in the MFI and query in a single call. Interfaces: - getOccupancy() - returns current recorded achieved occupancy. - getMinAllowedOccupancy() - returns lesser of the achieved occupancy and the lowest occupancy we are ready to tolerate. For example if a kernel is memory bound we are ready to tolerate 4 waves. - limitOccupancy() - record occupancy level if we have to lower it. - increaseOccupancy() - record occupancy if scheduler managed to increase the occupancy. MFI takes care of integrating different checks affecting occupancy, including LDS use and waves-per-eu attribute. Note that scheduler starts with not yet known register pressure, so has to record either limit or increase in occupancy after it is done. Later passes can just query a resulting value. New interface is used in the active scheduler and NFC wrt its work. Changes are also made to experimental schedulers to use it and record an occupancy after they are done. Before the change waves-per-eu was ignored by experimental schedulers and tolerance window for memory bound kernels was not used. Differential Revision: https://reviews.llvm.org/D47509 llvm-svn: 333629
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-28/+12
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-3/+3
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* AMDGPU: Support realigning stackMatt Arsenault2018-03-291-0/+9
| | | | | | | | | | | | | | | | | | While the stack access instructions don't care about alignment > 4, some transformations on the pointer calculation do make assumptions based on knowing the low bits of a pointer are 0. If a stack object ends up being accessed through its absolute address (relative to the kernel scratch wave offset), the addressing expression may depend on the stack frame being properly aligned. This was breaking in a testcase due to the add->or combine. I think some of the SP/FP handling logic is still backwards, and overly simplistic to support all of the stack features. Code which tries to modify the SP with inline asm for example or variable sized objects will probably require redoing this. llvm-svn: 328831
* [AMDGPU] stop buffer_store being moved illegallyTim Renouf2018-02-201-6/+2
| | | | | | | | | | | | | | | Summary: The machine instruction scheduler was illegally moving a buffer store past a buffer load with the same descriptor and offset. Fixed by marking buffer ops as mayAlias and isAliased. This may be overly conservative, and we may need to revisit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D43332 Change-Id: Iff3173d9e0653e830474546276ab9d30318b8ef7 llvm-svn: 325567
* Reapply "AMDGPU: Add 32-bit constant address space"Matt Arsenault2018-02-091-0/+6
| | | | | | This reverts r324494 and reapplies r324487. llvm-svn: 324747
* Revert "AMDGPU: Add 32-bit constant address space"Rafael Espindola2018-02-071-6/+0
| | | | | | | | This reverts commit r324487. It broke clang tests. llvm-svn: 324494
* AMDGPU: Add 32-bit constant address spaceMarek Olsak2018-02-071-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487
* [AMDGPU] stop image_store being moved illegallyTim Renouf2018-01-121-6/+2
| | | | | | | | | | | | | | | | | | | | Summary: A recent change 321556: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores can allow the machine instruction scheduler to move an image store past an image load using the same descriptor. V2: Fixed by marking image ops as mayAlias and isAliased. This may be overly conservative, and we may need to revisit. V3: Reverted test change done on 321556. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D41969 llvm-svn: 322419
* AMDGPU: Use unique PSVs for buffer resourcesMatt Arsenault2017-12-291-6/+9
| | | | | | | Also fixes using the wrong memory type for some intrinsics when custom lowering them. llvm-svn: 321557
* AMDGPU: Implement getTgtMemIntrinsic for imagesMatt Arsenault2017-12-291-4/+15
| | | | | | | | | | | | | Currently all images are lowered to have a single image PseudoSourceValue. Image stores happen to have overly strict mayLoad/mayStore/hasSideEffects flags set on them, so this happens to work. When these are fixed to be correct, the scheduler breaks this because the identical PSVs are assumed to be the same address. These need to be unique to the image resource value. llvm-svn: 321555
* Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layeringDavid Blaikie2017-11-081-2/+2
| | | | | | | | This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647
* AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field]Konstantin Zhuravlyov2017-11-021-1/+0
| | | | llvm-svn: 317280
* AMDGPU: Remove outdated fixme (it was already fixed)Konstantin Zhuravlyov2017-11-021-3/+0
| | | | llvm-svn: 317266
* [AMDGPU] AMDPAL scratch buffer supportTim Renouf2017-09-291-0/+9
| | | | | | | | | | | | | | | | | | | | | | | Summary: Added support for scratch (including spilling) for OS type amdpal: generates code to set up the scratch descriptor if it is needed. With amdpal, the scratch resource descriptor is loaded from offset 0 of the global information table. The low 32 bits of the address of the global information table is passed in s0. Added amdgpu-git-ptr-high function attribute to hard-wire the high 32 bits of the address of the global information table. If the function attribute is not specified, or is 0xffffffff, then the backend generates code to use the high 32 bits of pc. The documentation for the AMDPAL ABI will be added in a later commit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye Differential Revision: https://reviews.llvm.org/D37483 llvm-svn: 314501
* Add AddresSpace to PseudoSourceValue.Jan Sjodin2017-09-141-4/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297
* AMDGPU: Start adding tail call supportMatt Arsenault2017-08-111-0/+19
| | | | | | Handle the sibling call cases. llvm-svn: 310753
* [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-081-35/+40
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 310328
* AMDGPU: Pass special input registers to functionsMatt Arsenault2017-08-031-45/+51
| | | | llvm-svn: 309998
* AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to itMatt Arsenault2017-08-021-1/+18
| | | | llvm-svn: 309783
* AMDGPU: Annotate implicitarg.ptr usageMatt Arsenault2017-07-281-0/+8
| | | | | | | | | | | We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398
* AMDGPU: Figure out private memory regs after loweringMatt Arsenault2017-07-181-1/+4
| | | | | | | | | | | | | | | | | | Introduce pseudo-registers for registers needed for stack access, which are replaced during finalizeLowering. Note these pseudo-registers are currently only used for the used register location, and not for determining their input argument register. This is better because it avoids the need to try to predict whether a call will be emitted from the IR, and also detects stack objects introduced by legalization. Test changes are from the HasStackObjects check being more accurate since stack objects introduced during legalization are now known. llvm-svn: 308325
* AMDGPU: Annotate features from x work item/group IDs.Matt Arsenault2017-07-171-0/+5
| | | | | | | | This wasn't necessary before since they are always enabled for kernels, but this is necessary if they need to be forwarded to a callable function. llvm-svn: 308226
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-261-7/+7
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-3/+2
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: Add StackPtr and FramePtr registers to MFIMatt Arsenault2017-04-241-0/+24
| | | | | | These will be necessary for setting up call sequences. llvm-svn: 301208
* AMDGPU: Make MFI fields privateMatt Arsenault2017-04-181-3/+5
| | | | llvm-svn: 300596
OpenPOWER on IntegriCloud