summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/function-args.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Don't fix emergency stack slot at offset 0Matt Arsenault2019-06-051-87/+87
| | | | | | | | | | | | | | | | | | | | | This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665
* AMDGPU: Invert frame index offset interpretationMatt Arsenault2019-06-051-92/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-211-4/+2
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* DAG: Handle odd vector sizes in calling conv splittingMatt Arsenault2018-09-101-2/+11
| | | | | | | | | | | | | | This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces. Fixes not splitting 3i16/v3f16 into two registers for AMDGPU. This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets. llvm-svn: 341801
* AMDGPU: Stop wasting argument registers with v3i32/v3f32Matt Arsenault2018-07-281-0/+39
| | | | | | | | | | SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197
* AMDGPU: Make v4i16/v4f16 legalMatt Arsenault2018-06-151-5/+12
| | | | | | | Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-8/+8
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* AMDGPU: Use gfx9 carry-less add/sub instructionsMatt Arsenault2017-11-301-9/+9
| | | | llvm-svn: 319491
* AMDGPU: Use stricter regexes for add instructionsMatt Arsenault2017-11-291-5/+5
| | | | | | | Match the entire _co as one optional piece rather than a set of characters to match multiple times. llvm-svn: 319275
* [AMDGPU][MC][GFX8][GFX9] Corrected names of integer ↵Dmitry Preobrazhensky2017-11-201-5/+5
| | | | | | | | | | | | v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675
* AMDGPU: Don't place arguments in emergency stack slotMatt Arsenault2017-08-021-89/+89
| | | | | | | | When finding the fixed offsets for function arguments, this needs to skip over the 4 bytes reserved for the emergency stack slot. llvm-svn: 309776
* AMDGPU: Return correct type during argument loweringMatt Arsenault2017-07-151-0/+16
| | | | | | | | | | | | | | | | | | | The type needs to be casted back to the original argument type. Fixes an assert that for some reason is only run when using -debug. Includes an additional combine to avoid test regressions from having conversions mixed with multiple Assert[SZ]ext nodes. On subtargets where i16 is legal, this was producing an i32 register with an i16 AssertZExt, truncated to i16 with another i8 AssertZExt. t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 t3: i16 = truncate t2 t5: i16 = AssertZext t3, ValueType:ch:i8 t6: i8 = truncate t5 t7: i32 = zero_extend t6 llvm-svn: 308082
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-0/+734
Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
OpenPOWER on IntegriCloud