summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Setup SP/FP in callee function prolog/epilogMatt Arsenault2017-06-261-2/+73
| | | | llvm-svn: 306312
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-261-4/+4
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-11/+9
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5Marek Olsak2017-05-041-1/+2
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200
* AMDGPU: Shift down reserved SP register like scratch wave offsetMatt Arsenault2017-04-251-16/+58
| | | | llvm-svn: 301367
* AMDGPU: Slightly simplify prolog reserved register handlingMatt Arsenault2017-04-241-25/+27
| | | | | | | | | | | | | | Rely on MachineRegisterInfo's knowledge of used physical registers. Move flat_scratch initialization earlier, so the uses are visible when making these decisions. This will make it easier to add another reserved register at the end for the stack pointer rather than handling another special case. llvm-svn: 301254
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-1/+1
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-1/+2
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* [AMDGPU] Split R600/SI getFrameIndexReference and emit stack object offsets ↵Konstantin Zhuravlyov2017-03-101-0/+8
| | | | | | | | for SI Differential Revision: https://reviews.llvm.org/D29674 llvm-svn: 297499
* AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPRMatt Arsenault2017-02-221-36/+55
| | | | | | | | | This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891
* AMDGPU: Always allocate emergency stack slot at offset 0Matt Arsenault2017-02-221-5/+19
| | | | | | | | | This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-211-12/+38
| | | | | | | | | | | | | | | | Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-181-8/+22
| | | | llvm-svn: 295554
* [AMDGPU] Move register related queries to subtarget classKonstantin Zhuravlyov2017-02-081-8/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D29318 llvm-svn: 294440
* AMDGPU add support for spilling to a user sgpr pointed buffersTom Stellard2017-01-251-12/+46
| | | | | | | | | | | | | | | | | Summary: This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1]. Patch By: Dave Airlie Reviewers: nhaehnle, arsenm, tstellarAMD Reviewed By: arsenm Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25428 llvm-svn: 293000
* AMDGPU: Fix using incorrect private resource with no allocationMatt Arsenault2016-10-281-45/+76
| | | | | | | | | | | It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435
* AMDGPU/SI: Add support for triples with the mesa3d operating systemTom Stellard2016-09-161-3/+3
| | | | | | | | | | | | | | Summary: mesa3d will use the same kernel calling convention as amdhsa, but it will handle everything else like the default 'unknown' OS type. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22783 llvm-svn: 281779
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-6/+8
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU: Use copy instead of mov during frame loweringMatt Arsenault2016-08-311-15/+6
| | | | | | | This occurs before RA pseudos are expanded. It's less code to emit the copy. llvm-svn: 280297
* AMDGPU: Refactor frame loweringMatt Arsenault2016-08-311-121/+158
| | | | | | This will make future changes easier. llvm-svn: 280296
* AMDGPU: Use CreateStackObject instead of CreateSpillStackObjectMatt Arsenault2016-08-101-2/+2
| | | | | | | I'm not sure what the difference is, but no other target uses this for emergency spill slots. llvm-svn: 278272
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-6/+6
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in ↵Konstantin Zhuravlyov2016-06-251-1/+49
| | | | | | | | | | | | | | | | | | | | | | | the kernel code header Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue. Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format: - offset 0: work group ID x - offset 4: work group ID y - offset 8: work group ID z - offset 16: work item ID x - offset 20: work item ID y - offset 24: work item ID z Set - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled Differential Revision: http://reviews.llvm.org/D20335 llvm-svn: 273769
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-3/+9
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* Soften assertion in AMDGPU emitPrologue.Nirav Dave2016-05-251-2/+3
| | | | | | | | | | | | | | [AMDGPU] emitPrologue looks for an unused unallocated SGPR that is not the scratch descriptor. Continue search if unused register found fails other requirements. Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20526 llvm-svn: 270646
* AMDGPU: Fix assert on ttmp registersMatt Arsenault2016-05-181-2/+2
| | | | | | | | | | | Use register class that does not include them when looking for unallocated registers. This is hit by the udiv v8i64 test in the opencl integer conformance test, and takes a few seconds to compile in a debug build so no test included. llvm-svn: 269938
* AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRsTom Stellard2016-03-031-3/+15
| | | | | | | | | | | | | | Summary: When there were no free SGPRs, we were trying to move this value into some of the reserved registers which was causing a segmentation fault. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17590 llvm-svn: 262577
* AMDGPU: Set flat_scratch from flat_scratch_init regMatt Arsenault2016-02-121-15/+42
| | | | | | | | | | | | | | This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658
* AMDGPU/SI: Do not move scratch resource register on Tonga & IcelandNicolai Haehnle2016-01-051-42/+44
| | | | | | | | | | | | | Due to the SGPR init bug, every program claims to use the same number of SGPRs anyway, so there's no point in trying to shift those registers down from their initial spot of reservation. Add a test that uses VGPR spilling and blocks most SGPRs from being used for the scratch resource register. Previously, this would run into an assertion. Differential Revision: http://reviews.llvm.org/D15724 llvm-svn: 256870
* AMDGPU: Rework how private buffer passed for HSAMatt Arsenault2015-11-301-15/+153
| | | | | | | | | | | | | | | | If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331
* AMDGPU: Remove SIPrepareScratchRegsMatt Arsenault2015-11-301-0/+72
| | | | | | | | | | | | | | | | | | | | | | It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. llvm-svn: 254329
* AMDGPU: Create emergency stack slots during frame loweringMatt Arsenault2015-11-061-0/+33
Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327
OpenPOWER on IntegriCloud