summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll
Commit message (Collapse)AuthorAgeFilesLines
* RegAllocFast: Leave unassigned virtreg entries in mapMatthias Braun2018-11-071-186/+156
| | | | | | | | | | | | | | | | Set `LiveReg::PhysReg` to zero when freeing a register instead of removing it from the entry from `LiveRegMap`. This way no iterators get invalidated and we can avoid passing around and updating iterators all over the place. This does not change any allocator decisions. It is not completely NFC because the arbitrary iteration order through `LiveRegMap` in `spillAll()` changes so we may get a different order in those spill sequences (the amount of spills does not change). This is in preparation of https://reviews.llvm.org/D52010. llvm-svn: 346298
* AMDGPU: Add pass to lower kernel arguments to loadsMatt Arsenault2018-06-261-130/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650
* AMDGPU: Use correct register names in inline assemblyMatt Arsenault2017-06-081-5/+5
| | | | | | Fixes using physical registers in inline asm from clang. llvm-svn: 305004
* [AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passesSam Kolton2017-04-071-1/+1
| | | | | | | | | | | | | | Summary: Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled. With this change order of passes will not change. Reviewers: arsenm, vpykhtin, rampitec Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31705 llvm-svn: 299757
* [AMDGPU] Resubmit SDWA peephole: enable by defaultSam Kolton2017-04-061-1/+4
| | | | | | | | | | Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299654
* Revert r299536. [AMDGPU] SDWA peephole: enable by default.Ivan Krasin2017-04-051-4/+1
| | | | | | | | | | | Reason: breaks multiple bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173 Original Review URL: https://reviews.llvm.org/D31671 llvm-svn: 299583
* [AMDGPU] SDWA peephole: enable by defaultSam Kolton2017-04-051-1/+4
| | | | | | | | | | Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299536
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-211-0/+635
Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
OpenPOWER on IntegriCloud