summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/byval-frame-setup.ll
Commit message (Collapse)AuthorAgeFilesLines
* [MachineScheduler] Reduce reordering due to mem op clusteringJay Foad2020-01-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: Mem op clustering adds a weak edge in the DAG between two loads or stores that should be clustered, but the direction of this edge is pretty arbitrary (it depends on the sort order of MemOpInfo, which represents the operands of a load or store). This often means that two loads or stores will get reordered even if they would naturally have been scheduled together anyway, which leads to test case churn and goes against the scheduler's "do no harm" philosophy. The fix makes sure that the direction of the edge always matches the original code order of the instructions. Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72706
* AMDGPU: Allow getMemOperandWithOffset to analyze stack accessesMatt Arsenault2019-09-051-3/+4
| | | | | | | Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149
* AMDGPU: Make s34 the FP registerMatt Arsenault2019-07-081-37/+41
| | | | | | | | | | | | | | | | | | | | | | | Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372
* CodeGen: Set hasSideEffects = 0 on BUNDLEMatt Arsenault2019-07-031-17/+22
| | | | | | | | | | | | | The BUNDLE itself should not have side effects, and this is a property of instructions inside the bundle. The hasProperty check already searches for any member instructions, which was pointless since it was overridden by this bit. Allows me to distinguish bundles that have side effects vs. do not in a future patch. Also fixes an unnecessary scheduling barrier in the bundle AMDGPU uses to get PC relative addresses. llvm-svn: 364984
* AMDGPU: Fold frame index into MUBUFMatt Arsenault2019-06-241-0/+41
| | | | | | | | | | | | | | | | This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185
* AMDGPU: Eliminate test usage of legacy FP elim attributesMatt Arsenault2019-06-201-1/+1
| | | | llvm-svn: 363950
* AMDGPU: Don't fix emergency stack slot at offset 0Matt Arsenault2019-06-051-66/+66
| | | | | | | | | | | | | | | | | | | | | This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665
* AMDGPU: Invert frame index offset interpretationMatt Arsenault2019-06-051-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
* AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spillsMatt Arsenault2019-05-241-1/+1
| | | | | | | If some lanes weren't active on entry to the function, this could clobber their VGPR values. llvm-svn: 361655
* AMDGPU: Introduce TokenFactor for ABI register copies in call sequenceMatt Arsenault2019-05-161-20/+18
| | | | | | | The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906
* [AMDGPU] Mark test functions with hidden visibilityScott Linder2019-02-011-2/+2
| | | | | | | | | Prepare for future patch which affects codegen for calls to preemptible functions. Differential Revision: https://reviews.llvm.org/D57605 llvm-svn: 352920
* AMDGPU: Cleanup / relax tests for future changesMatt Arsenault2018-11-261-2/+2
| | | | llvm-svn: 347576
* AMDGPU: Fix private handling for allowsMisalignedMemoryAccessesMatt Arsenault2018-09-241-17/+20
| | | | | | | | | | | | | If the alignment is at least 4, this should report true. Something still seems off with how < 4-byte types are handled here though. Fixing this seems to change how some combines get to where they get, but somehow isn't changing the net result. llvm-svn: 342879
* AMDGPU: Fix not respecting byval alignment in call frame setupMatt Arsenault2018-08-221-2/+142
| | | | | | | | | This was hackily adding in the 4-bytes reserved for the callee's emergency stack slot. Treat it like a normal stack allocation so we get the correct alignment padding behavior. This fixes an inconsistency between the caller and callee. llvm-svn: 340396
* AMDGPU: Increase default stack alignmentMatt Arsenault2018-03-291-3/+3
| | | | | | | 8 and 16-byte values are common, so increase the default alignment to avoid realigning the stack in most functions. llvm-svn: 328821
* [AMDGPU] added writelane intrinsicTim Renouf2018-02-281-11/+9
| | | | | | | | | | | | | | | | | Summary: For use by LLPC SPV_AMD_shader_ballot extension. The v_writelane instruction was already implemented for use by SGPR spilling, but I had to add an extra dummy operand tied to the destination, to represent that all lanes except the selected one keep the old value of the destination register. .ll test changes were due to schedule changes caused by that new operand. Differential Revision: https://reviews.llvm.org/D42838 llvm-svn: 326353
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-55/+55
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* AMDGPU: Enable IPRAMatt Arsenault2017-11-281-2/+2
| | | | llvm-svn: 319256
* [AMDGPU][MC][GFX8][GFX9] Corrected names of integer ↵Dmitry Preobrazhensky2017-11-201-2/+2
| | | | | | | | | | | | v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675
* AMDGPU: Stop modifying SP in call sequencesMatt Arsenault2017-09-141-7/+7
| | | | | | | | | | | | | | | Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279
* AMDGPU: Don't spill SP reg like a normal CSRMatt Arsenault2017-09-131-5/+8
| | | | llvm-svn: 313217
* Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-08-171-9/+8
| | | | | | | | | | This reverts commit r311038. Several buildbots are breaking, and at least one appears to be due to the forwarding of physical regs enabled by this change. Reverting while I investigate further. llvm-svn: 311062
* [MachineCopyPropagation] Extend pass to do COPY source forwardingGeoff Berry2017-08-161-8/+9
| | | | | | | | | | | | | | | | | | This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 llvm-svn: 311038
* AMDGPU: Remove error on calls for amdgcnMatt Arsenault2017-08-031-2/+2
| | | | | | | | Repurpose the -amdgpu-function-calls flag. Rather than require it to emit a call, only use it to run the always inline path or not. llvm-svn: 310003
* AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to itMatt Arsenault2017-08-021-5/+7
| | | | llvm-svn: 309783
* AMDGPU: Don't place arguments in emergency stack slotMatt Arsenault2017-08-021-26/+26
| | | | | | | | When finding the fixed offsets for function arguments, this needs to skip over the 4 bytes reserved for the emergency stack slot. llvm-svn: 309776
* DAG: Undo and->or combine with FrameIndexesMatt Arsenault2017-08-021-63/+35
| | | | | | | | | | | | | | This pattern shows up when lowering byval copies on AMDGPU. The byval object access is split into 4-byte chunks, adding a constant offset to the FixedStack base. When some of the offsets turn into ors, this prevents combining the constant offsets. This makes it not apparent that the object is there when matching addressing modes, so it ends up using a scratch wave offset relative access and the lengthy frame index expansion for that. llvm-svn: 309775
* AMDGPU: Initial implementation of callsMatt Arsenault2017-08-011-0/+235
Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732
OpenPOWER on IntegriCloud