summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-15/+15
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Pass function directly instead of MachineFunctionMatt Arsenault2018-05-291-7/+8
| | | | | | | These functions just query the underlying IR function, so pass it directly. llvm-svn: 333442
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* AMDGPU: Assign enum name to stack IDMatt Arsenault2018-04-231-0/+1
| | | | | | | | | Also assert that it is correct for SGPRs. There is currently a bug where stack slot coloring replaces SGPR spill FIs with one with the default ID, which results in a more confusing assert later about a dead object. llvm-svn: 330607
* [AMDGPU] For OS type AMDPAL, fixed scratch on compute shaderTim Renouf2018-04-101-2/+6
| | | | | | | | | | | | | | | | | | | | | Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading scratch descriptor from GIT. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 329690
* AMDGPU: Fix build warning in releaseMatt Arsenault2018-03-291-2/+0
| | | | llvm-svn: 328832
* AMDGPU: Support realigning stackMatt Arsenault2018-03-291-8/+74
| | | | | | | | | | | | | | | | | | While the stack access instructions don't care about alignment > 4, some transformations on the pointer calculation do make assumptions based on knowing the low bits of a pointer are 0. If a stack object ends up being accessed through its absolute address (relative to the kernel scratch wave offset), the addressing expression may depend on the stack frame being properly aligned. This was breaking in a testcase due to the add->or combine. I think some of the SP/FP handling logic is still backwards, and overly simplistic to support all of the stack features. Code which tries to modify the SP with inline asm for example or variable sized objects will probably require redoing this. llvm-svn: 328831
* Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader"Tim Renouf2018-03-281-4/+2
| | | | | | | | | | | This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981. It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a release-with-asserts build. I will resubmit the change when I have fixed that. Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a llvm-svn: 328695
* [AMDGPU] For OS type AMDPAL, fixed scratch on compute shaderTim Renouf2018-03-271-2/+4
| | | | | | | | | | | | | | | | | | Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 328673
* [AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shaderTim Renouf2018-02-261-1/+14
| | | | | | | | | | | | | | | | Summary: With OS type AMDPAL, the scratch descriptor is hardwired to be loaded from offset 0 of the global information table, whose low pointer is passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as the hardware reserves s0-s7. Reviewers: kzhuravl Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D42203 llvm-svn: 326088
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-3/+3
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* AMDGPU: Fix set but not used warnings related to AMDGPUASKonstantin Zhuravlyov2017-11-011-2/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D39499 llvm-svn: 317114
* [AMDGPU] AMDPAL scratch buffer supportTim Renouf2017-09-291-2/+59
| | | | | | | | | | | | | | | | | | | | | | | Summary: Added support for scratch (including spilling) for OS type amdpal: generates code to set up the scratch descriptor if it is needed. With amdpal, the scratch resource descriptor is loaded from offset 0 of the global information table. The low 32 bits of the address of the global information table is passed in s0. Added amdgpu-git-ptr-high function attribute to hard-wire the high 32 bits of the address of the global information table. If the function attribute is not specified, or is 0xffffffff, then the backend generates code to use the high 32 bits of pc. The documentation for the AMDPAL ABI will be added in a later commit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye Differential Revision: https://reviews.llvm.org/D37483 llvm-svn: 314501
* AMDGPU: Don't spill SP reg like a normal CSRMatt Arsenault2017-09-131-0/+9
| | | | llvm-svn: 313217
* AMDGPU: Pass special input registers to functionsMatt Arsenault2017-08-031-6/+6
| | | | llvm-svn: 309998
* AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to itMatt Arsenault2017-08-021-3/+22
| | | | llvm-svn: 309783
* AMDGPU: Initial implementation of callsMatt Arsenault2017-08-011-0/+35
| | | | | | | | | Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732
* AMDGPU: Annotate necessity of flat-scratch-initMatt Arsenault2017-07-181-2/+3
| | | | | | | | As an approximation of the existing handling to avoid regressions. Fixes using too many registers with calls on subtargets with the SGPR allocation bug. llvm-svn: 308326
* AMDGPU: Figure out private memory regs after loweringMatt Arsenault2017-07-181-2/+4
| | | | | | | | | | | | | | | | | | Introduce pseudo-registers for registers needed for stack access, which are replaced during finalizeLowering. Note these pseudo-registers are currently only used for the used register location, and not for determining their input argument register. This is better because it avoids the need to try to predict whether a call will be emitted from the IR, and also detects stack objects introduced by legalization. Test changes are from the HasStackObjects check being more accurate since stack objects introduced during legalization are now known. llvm-svn: 308325
* AMDGPU: Setup SP/FP in callee function prolog/epilogMatt Arsenault2017-06-261-2/+73
| | | | llvm-svn: 306312
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-261-4/+4
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-11/+9
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5Marek Olsak2017-05-041-1/+2
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200
* AMDGPU: Shift down reserved SP register like scratch wave offsetMatt Arsenault2017-04-251-16/+58
| | | | llvm-svn: 301367
* AMDGPU: Slightly simplify prolog reserved register handlingMatt Arsenault2017-04-241-25/+27
| | | | | | | | | | | | | | Rely on MachineRegisterInfo's knowledge of used physical registers. Move flat_scratch initialization earlier, so the uses are visible when making these decisions. This will make it easier to add another reserved register at the end for the stack pointer rather than handling another special case. llvm-svn: 301254
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-1/+1
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-1/+2
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* [AMDGPU] Split R600/SI getFrameIndexReference and emit stack object offsets ↵Konstantin Zhuravlyov2017-03-101-0/+8
| | | | | | | | for SI Differential Revision: https://reviews.llvm.org/D29674 llvm-svn: 297499
* AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPRMatt Arsenault2017-02-221-36/+55
| | | | | | | | | This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891
* AMDGPU: Always allocate emergency stack slot at offset 0Matt Arsenault2017-02-221-5/+19
| | | | | | | | | This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-211-12/+38
| | | | | | | | | | | | | | | | Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-181-8/+22
| | | | llvm-svn: 295554
* [AMDGPU] Move register related queries to subtarget classKonstantin Zhuravlyov2017-02-081-8/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D29318 llvm-svn: 294440
* AMDGPU add support for spilling to a user sgpr pointed buffersTom Stellard2017-01-251-12/+46
| | | | | | | | | | | | | | | | | Summary: This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1]. Patch By: Dave Airlie Reviewers: nhaehnle, arsenm, tstellarAMD Reviewed By: arsenm Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25428 llvm-svn: 293000
* AMDGPU: Fix using incorrect private resource with no allocationMatt Arsenault2016-10-281-45/+76
| | | | | | | | | | | It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435
* AMDGPU/SI: Add support for triples with the mesa3d operating systemTom Stellard2016-09-161-3/+3
| | | | | | | | | | | | | | Summary: mesa3d will use the same kernel calling convention as amdhsa, but it will handle everything else like the default 'unknown' OS type. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22783 llvm-svn: 281779
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-6/+8
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU: Use copy instead of mov during frame loweringMatt Arsenault2016-08-311-15/+6
| | | | | | | This occurs before RA pseudos are expanded. It's less code to emit the copy. llvm-svn: 280297
* AMDGPU: Refactor frame loweringMatt Arsenault2016-08-311-121/+158
| | | | | | This will make future changes easier. llvm-svn: 280296
* AMDGPU: Use CreateStackObject instead of CreateSpillStackObjectMatt Arsenault2016-08-101-2/+2
| | | | | | | I'm not sure what the difference is, but no other target uses this for emergency spill slots. llvm-svn: 278272
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-6/+6
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in ↵Konstantin Zhuravlyov2016-06-251-1/+49
| | | | | | | | | | | | | | | | | | | | | | | the kernel code header Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue. Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format: - offset 0: work group ID x - offset 4: work group ID y - offset 8: work group ID z - offset 16: work item ID x - offset 20: work item ID y - offset 24: work item ID z Set - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled Differential Revision: http://reviews.llvm.org/D20335 llvm-svn: 273769
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-3/+9
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* Soften assertion in AMDGPU emitPrologue.Nirav Dave2016-05-251-2/+3
| | | | | | | | | | | | | | [AMDGPU] emitPrologue looks for an unused unallocated SGPR that is not the scratch descriptor. Continue search if unused register found fails other requirements. Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20526 llvm-svn: 270646
* AMDGPU: Fix assert on ttmp registersMatt Arsenault2016-05-181-2/+2
| | | | | | | | | | | Use register class that does not include them when looking for unallocated registers. This is hit by the udiv v8i64 test in the opencl integer conformance test, and takes a few seconds to compile in a debug build so no test included. llvm-svn: 269938
* AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRsTom Stellard2016-03-031-3/+15
| | | | | | | | | | | | | | Summary: When there were no free SGPRs, we were trying to move this value into some of the reserved registers which was causing a segmentation fault. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17590 llvm-svn: 262577
* AMDGPU: Set flat_scratch from flat_scratch_init regMatt Arsenault2016-02-121-15/+42
| | | | | | | | | | | | | | This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658
* AMDGPU/SI: Do not move scratch resource register on Tonga & IcelandNicolai Haehnle2016-01-051-42/+44
| | | | | | | | | | | | | Due to the SGPR init bug, every program claims to use the same number of SGPRs anyway, so there's no point in trying to shift those registers down from their initial spot of reservation. Add a test that uses VGPR spilling and blocks most SGPRs from being used for the scratch resource register. Previously, this would run into an assertion. Differential Revision: http://reviews.llvm.org/D15724 llvm-svn: 256870
* AMDGPU: Rework how private buffer passed for HSAMatt Arsenault2015-11-301-15/+153
| | | | | | | | | | | | | | | | If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331
OpenPOWER on IntegriCloud