summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/sibling-call.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Make s34 the FP registerMatt Arsenault2019-07-081-22/+27
| | | | | | | | | | | | | | | | | | | | | | | Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372
* CodeGen: Set hasSideEffects = 0 on BUNDLEMatt Arsenault2019-07-031-5/+4
| | | | | | | | | | | | | The BUNDLE itself should not have side effects, and this is a property of instructions inside the bundle. The hasProperty check already searches for any member instructions, which was pointless since it was overridden by this bit. Allows me to distinguish bundles that have side effects vs. do not in a future patch. Also fixes an unnecessary scheduling barrier in the bundle AMDGPU uses to get PC relative addresses. llvm-svn: 364984
* AMDGPU: Always use s33 for global scratch wave offsetMatt Arsenault2019-06-201-5/+5
| | | | | | | | | Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990
* AMDGPU: Don't fix emergency stack slot at offset 0Matt Arsenault2019-06-051-26/+29
| | | | | | | | | | | | | | | | | | | | | This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665
* AMDGPU: Invert frame index offset interpretationMatt Arsenault2019-06-051-30/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
* AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spillsMatt Arsenault2019-05-241-3/+10
| | | | | | | If some lanes weren't active on entry to the function, this could clobber their VGPR values. llvm-svn: 361655
* [AMDGPU] Mark test functions with hidden visibilityScott Linder2019-02-011-2/+2
| | | | | | | | | Prepare for future patch which affects codegen for calls to preemptible functions. Differential Revision: https://reviews.llvm.org/D57605 llvm-svn: 352920
* AMDGPU: Generate VALU ThreeOp Integer instructionsNicolai Haehnle2018-12-061-2/+1
| | | | | | | | | | | | | | | Summary: Original patch by: Fabian Wahlster <razor@singul4rity.com> Change-Id: I148f692a88432541fad468963f58da9ddf79fac5 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, b-sumner, llvm-commits Differential Revision: https://reviews.llvm.org/D51995 llvm-svn: 348488
* [AMDGPU] Divergence driven instruction selection. Part 1.Alexander Timofeev2018-09-211-6/+6
| | | | | | | | | | | | | Summary: This change is the first part of the AMDGPU target description change. The aim of it is the effective splitting the vector and scalar flows at the selection stage. Selection uses predicate functions based on the framework implemented earlier - https://reviews.llvm.org/D35267 Differential revision: https://reviews.llvm.org/D52019 Reviewers: rampitec llvm-svn: 342719
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-3/+3
| | | | llvm-svn: 341165
* [AMDGPU] added writelane intrinsicTim Renouf2018-02-281-1/+1
| | | | | | | | | | | | | | | | | Summary: For use by LLPC SPV_AMD_shader_ballot extension. The v_writelane instruction was already implemented for use by SGPR spilling, but I had to add an extra dummy operand tied to the destination, to represent that all lanes except the selected one keep the old value of the destination register. .ll test changes were due to schedule changes caused by that new operand. Differential Revision: https://reviews.llvm.org/D42838 llvm-svn: 326353
* AMDGPU: Use gfx9 carry-less add/sub instructionsMatt Arsenault2017-11-301-9/+22
| | | | llvm-svn: 319491
* AMDGPU: Enable IPRAMatt Arsenault2017-11-281-3/+3
| | | | llvm-svn: 319256
* [AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argumentYaxun Liu2017-11-221-26/+27
| | | | | | | | | | | SITargetLowering::LowerCall uses dummy pointer info for byval argument, which causes flat load instead of buffer load. This patch fixes that. Differential Revision: https://reviews.llvm.org/D40040 llvm-svn: 318844
* [AMDGPU][MC][GFX8][GFX9] Corrected names of integer ↵Dmitry Preobrazhensky2017-11-201-6/+6
| | | | | | | | | | | | v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675
* AMDGPU: Make frame register caller preservedMatt Arsenault2017-09-141-1/+1
| | | | | | | | | | | | | Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274
* AMDGPU: Don't spill SP reg like a normal CSRMatt Arsenault2017-09-131-0/+2
| | | | llvm-svn: 313217
* AMDGPU: Fix not accounting for tail call resource usageMatt Arsenault2017-09-051-0/+31
| | | | | | | | If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561
* AMDGPU: Start adding tail call supportMatt Arsenault2017-08-111-0/+225
Handle the sibling call cases. llvm-svn: 310753
OpenPOWER on IntegriCloud