summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/call-waitcnt.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Make s34 the FP registerMatt Arsenault2019-07-081-9/+9
| | | | | | | | | | | | | | | | | | | | | | | Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372
* CodeGen: Set hasSideEffects = 0 on BUNDLEMatt Arsenault2019-07-031-4/+4
| | | | | | | | | | | | | The BUNDLE itself should not have side effects, and this is a property of instructions inside the bundle. The hasProperty check already searches for any member instructions, which was pointless since it was overridden by this bit. Allows me to distinguish bundles that have side effects vs. do not in a future patch. Also fixes an unnecessary scheduling barrier in the bundle AMDGPU uses to get PC relative addresses. llvm-svn: 364984
* AMDGPU: Always use s33 for global scratch wave offsetMatt Arsenault2019-06-201-19/+14
| | | | | | | | | Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990
* AMDGPU: Avoid most waitcnts before callsMatt Arsenault2019-06-141-5/+1
| | | | | | | | | | | Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465
* AMDGPU: Add baseline test for call waitcnt insertionMatt Arsenault2019-06-141-0/+161
llvm-svn: 363453
OpenPOWER on IntegriCloud