summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/load-hi16.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Always use s33 for global scratch wave offsetMatt Arsenault2019-06-201-10/+10
| | | | | | | | | Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990
* AMDGPU: Don't fix emergency stack slot at offset 0Matt Arsenault2019-06-051-11/+11
| | | | | | | | | | | | | | | | | | | | | This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665
* AMDGPU: Invert frame index offset interpretationMatt Arsenault2019-06-051-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
* AMDGPU: Assume ECC is enabled by default if supportedMatt Arsenault2019-04-031-1/+1
| | | | | | | | | | The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558
* AMDGPU: Make sram-ecc off by default for Vega20Konstantin Zhuravlyov2019-03-291-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D59718 llvm-svn: 357247
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-081-0/+15
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* AMDGPU: Add more tests for d16 loadsMatt Arsenault2019-03-081-2/+316
| | | | | | Also fix a few cases that weren't testing what they were supposed to. llvm-svn: 355724
* AMDGPU: Add D16 instructions preserve unused bits featureKonstantin Zhuravlyov2018-05-041-200/+207
| | | | | | | | | - Predicate D16 patterns on this new feature - Added this new feature to gfx900/2/4 Differential Revision: https://reviews.llvm.org/D46366 llvm-svn: 331551
* [AMDGPU] Change constant addr space to 4Yaxun Liu2018-02-131-10/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-56/+56
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* AMDGPU: Select DS insts without m0 initializationMatt Arsenault2017-11-291-2/+0
| | | | | | | | | GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed. llvm-svn: 319270
* AMDGPU: Don't use MUBUF vaddr if address may overflowMatt Arsenault2017-11-151-28/+27
| | | | | | | Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. llvm-svn: 318240
* AMDGPU: Fix not converting d16 load/stores to offsetMatt Arsenault2017-11-131-2/+58
| | | | | | Fixes missed optimization with new MUBUF instructions. llvm-svn: 318106
* AMDGPU: Select d16 loads into low component of registerMatt Arsenault2017-11-131-0/+98
| | | | llvm-svn: 318005
* AMDGPU: Fix potentially incorrectly matching check linesMatt Arsenault2017-10-021-8/+8
| | | | | | | | These check lines are supposed to make sure the new d16 load instructions aren't used, but the expected instruction name is a prefix of the incorrect instruction name. llvm-svn: 314714
* AMDGPU: Match load d16 hi instructionsMatt Arsenault2017-09-201-0/+506
Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
OpenPOWER on IntegriCloud