summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Don't use XEXEC class for SGPRsMatt Arsenault2020-01-121-1/+1
| | | | | We don't use the xexec register classes for arbitrary values anymore. Avoids a test variance beween GlobalISel and SelectionDAG>
* AMDGPU/GlobalISel: Replace handling of boolean valuesMatt Arsenault2020-01-061-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This solves selection failures with generated selection patterns, which would fail due to inferring the SGPR reg bank for virtual registers with a set register class instead of VCC bank. Use instruction selection would constrain the virtual register to a specific class, so when the def was selected later the bank no longer was set to VCC. Remove the SCC reg bank. SCC isn't directly addressable, so it requires copying from SCC to an allocatable 32-bit register during selection, so these might as well be treated as 32-bit SGPR values. Now any scalar boolean value that will produce an outupt in SCC should be widened during RegBankSelect to s32. Any s1 value should be a vector boolean during selection. This makes the vcc register bank unambiguous with a normal SGPR during selection. Summary of how this should now work: - G_TRUNC is always a no-op, and never should use a vcc bank result. - SALU boolean operations should be promoted to s32 in RegBankSelect apply mapping - An s1 value means vcc bank at selection. The exception is for legalization artifacts that use s1, which are never VCC. All other contexts should infer the VCC register classes for s1 typed registers. The LLT for the register is now needed to infer the correct register class. Extensions with vcc sources should be legalized to a select of constants during RegBankSelect. - Copy from non-vcc to vcc ensures high bits of the input value are cleared during selection. - SALU boolean inputs should ensure the inputs are 0/1. This includes select, conditional branches, and carry-ins. There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT selection ignores the usual register-bank from register class functions, and can't handle truncates with VCC result banks. I think this is OK, since the artifacts are specially treated anyway. This does require some care to avoid producing cases with vcc. There will also be no 100% reliable way to verify this rule is followed in selection in case of register classes, and violations manifests themselves as invalid copy instructions much later. Standard phi handling also only considers the bank of the result register, and doesn't insert copies to make the source banks match. This doesn't work for vcc, so we have to manually correct phi inputs in this case. We should add a verifier check to make sure there are no phis with mixed vcc and non-vcc register bank inputs. There's also some duplication with the LegalizerHelper, and some code which should live in the helper. I don't see a good way to share special knowledge about what types to use for intermediate operations depending on the bank for example. Using the helper to replace extensions with selects also seems somewhat awkward to me. Another issue is there are some contexts calling getRegBankFromRegClass that apparently don't have the LLT type for the register, but I haven't yet run into a real issue from this. This also introduces new unnecessary instructions in most cases, since we don't yet try to optimize out the zext when the source is known to come from a compare.
* Fix broken comment phrasing and indentationMatt Arsenault2019-12-021-7/+6
|
* AMDGPU: Reuse carry out register during FI eliminationAustin Kerbow2019-11-281-5/+9
| | | | | | | | | | | | | | | | | | | Summary: Pre gfx9 we need to scavenge a 64-bit SGPR to use as the carry out for an Add. If only one SGPR was available this crashed when trying to scavenge another 32bit SGPR to materialize the offset. Instead, reuse a 32-bit SGPR from the carry out as the offset register. Also prefer to use vcc for the unused carry out when it is available. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70614
* [AMDGPU] Removed dead code handling M0CopyRegStanislav Mekhanoshin2019-11-051-14/+0
| | | | | | | Static analyzer complains about always false condition. See https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69860
* Fix buildbot error in SIRegisterInfo.cpp.Zinovy Nis2019-10-201-3/+4
| | | | llvm-svn: 375373
* AMDGPU: Don't re-get the subtargetMatt Arsenault2019-10-201-21/+9
| | | | | | It's already available in the class. llvm-svn: 375363
* [AMDGPU] Remove -amdgpu-spill-sgpr-to-smem.Jay Foad2019-10-181-151/+1
| | | | | | | | | | | | | | Summary: The implementation was never completed and never used except in tests. Reviewers: arsenm, mareko Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69163 llvm-svn: 375293
* AMDGPU: Relax 32-bit SGPR register classMatt Arsenault2019-10-181-4/+8
| | | | | | | | | | | Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267
* AMDGPU: Use SGPR_128 instead of SReg_128 for vregsMatt Arsenault2019-10-101-6/+9
| | | | | | | | | SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284
* [AMDGPU] Extend buffer intrinsics with swizzlingPiotr Sobczak2019-10-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
* AMDGPU: Fix an out of date assert in addressing FrameIndexChangpeng Fang2019-10-011-3/+2
| | | | | | | | | | Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404
* AMDGPU/GlobalISel: Increase max legal size to 1024Matt Arsenault2019-10-011-0/+3
| | | | | | | | There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350
* AMDGPU: Inline constant when materalizing FI with add on gfx9Matt Arsenault2019-09-121-2/+5
| | | | | | | | | This was relying on the SGPR usable for the carry out clobber to also be used for the input. There was no carry out on gfx9. With no carry out clobber to worry about, so the literal can just be directly used with a VOP2 add. llvm-svn: 371791
* AMDGPU: Make VReg_1 size be 1Matt Arsenault2019-09-091-3/+6
| | | | | | | This was getting chosen as the preferred 32-bit register class based on how TableGen selects subregister classes. llvm-svn: 371438
* AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9Matt Arsenault2019-09-041-25/+57
| | | | | | | | | | | | | | Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929
* AMDGPU: Don't use frame virtual registersMatt Arsenault2019-08-291-41/+43
| | | | | | | | | | | | | | SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281
* [AMDGPU] w/a for gfx908 mfma SrcC literal HW bugStanislav Mekhanoshin2019-08-231-0/+10
| | | | | | | | gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369816
* Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders2019-08-151-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
* Use MCRegister in MCRegisterInfo's interfacesDaniel Sanders2019-08-021-3/+3
| | | | | | | | | | | | | | | | | Summary: As part of this, define DenseMapInfo for MCRegister (and Register while I'm at it) Depends on D65599 Reviewers: arsenm Subscribers: MatzeB, qcolombet, jvesely, wdng, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65605 llvm-svn: 367719
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-011-3/+3
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* [AMDGPU] Reserve all AGPRs on targets which do not have themStanislav Mekhanoshin2019-07-301-0/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D65471 llvm-svn: 367347
* [AMDGPU] Allow register tuples to set asm namesStanislav Mekhanoshin2019-07-191-14/+1
| | | | | | | | | | | | This change reverts most of the previous register name generation. The real problem is that RegisterTuple does not generate asm names. Added optional operand to RegisterTuple. This way we can simplify register name access and dramatically reduce the size of static tables for the backend. Differential Revision: https://reviews.llvm.org/D64967 llvm-svn: 366598
* [AMDGPU] Drop Reg32 and use regular AsmNameStanislav Mekhanoshin2019-07-181-1/+0
| | | | | | | | This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb. Differential Revision: https://reviews.llvm.org/D64952 llvm-svn: 366505
* [AMDGPU] Stop special casing flat_scratch for register nameStanislav Mekhanoshin2019-07-171-12/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D64885 llvm-svn: 366376
* [AMDGPU] Autogenerate register asm namesStanislav Mekhanoshin2019-07-161-61/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D64839 llvm-svn: 366283
* [AMDGPU] gfx908 agpr spillingStanislav Mekhanoshin2019-07-111-17/+108
| | | | | | Differential Revision: https://reviews.llvm.org/D64594 llvm-svn: 365833
* [AMDGPU] gfx908 mfma supportStanislav Mekhanoshin2019-07-111-6/+145
| | | | | | Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
* [AMDGPU] gfx908 mAI instructions, MC partStanislav Mekhanoshin2019-07-091-0/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563
* AMDGPU/GlobalISel: Select G_MERGE_VALUESMatt Arsenault2019-07-091-16/+31
| | | | llvm-svn: 365482
* AMDGPU/GlobalISel: Fix scc->vcc copy handlingMatt Arsenault2019-07-011-2/+2
| | | | | | | | | | | | | This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761
* AMDGPU: Assert SPAdj is 0Matt Arsenault2019-06-261-0/+2
| | | | llvm-svn: 364473
* AMDGPU/GlobalISel: Select G_TRUNCMatt Arsenault2019-06-241-24/+30
| | | | llvm-svn: 364215
* AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1Matt Arsenault2019-06-241-3/+16
| | | | | | Try to fail for scc, since I don't think that should ever be produced. llvm-svn: 364199
* CodeGen: Introduce a class for registersMatt Arsenault2019-06-241-4/+4
| | | | | | | | | Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191
* AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECTTom Stellard2019-06-171-1/+6
| | | | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576
* [AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin2019-06-161-1/+28
| | | | | | | | | This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
* AMDGPU: Invert frame index offset interpretationMatt Arsenault2019-06-051-27/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
* AMDGPU: Disable stack realignment for kernelsMatt Arsenault2019-06-031-0/+13
| | | | | | | | | | | | | | | | | | | This is something of a workaround, and the state of stack realignment controls is kind of a mess. Ideally, we would be able to specify the stack is infinitely aligned on entry to a kernel. TargetFrameLowering provides multiple controls which apply at different points. The StackRealignable field is used during SelectionDAG, and for some reason distinct from this hook. StackAlignment is a single field not dependent on the function. It would probably be better to make that dependent on the calling convention, and the maximum value for kernels. Currently this doesn't really change anything, since the frame lowering mostly does its own thing. This helps avoid regressions in a future change which will rely more heavily on hasFP. llvm-svn: 362447
* [AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operandsDmitry Preobrazhensky2019-06-031-0/+5
| | | | | | | | | | See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292 Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D62660 llvm-svn: 362400
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-301-3/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [AMDGPU] gfx1010 sgpr register changesStanislav Mekhanoshin2019-04-241-2/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D61045 llvm-svn: 359117
* [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.Neil Henning2019-04-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400
* Reapply "AMDGPU: Scavenge register instead of findUnusedReg"Matt Arsenault2019-03-271-1/+1
| | | | | | | | | | | | | This reapplies r356149, using the correct overload of findUnusedReg which passes the current iterator. This worked most of the time, because the scavenger iterator was moved at the end of the frame index loop in PEI. This would fail if the spill was the first instruction. This was further hidden by the fact that the scavenger wasn't passed in for normal frame index elimination. llvm-svn: 357098
* AMDGPU: Enable the scavenger for large framesMatt Arsenault2019-03-271-5/+14
| | | | | | | Another test is needed for the case where the scavenge fail, but there's another issue with that which needs an additional fix. llvm-svn: 357093
* Revert "AMDGPU: Scavenge register instead of findUnusedReg"Matt Arsenault2019-03-251-1/+1
| | | | | | | | This reverts r356149. This is crashing on rocBLAS. llvm-svn: 356958
* [AMDGPU] Added v5i32 and v5f32 register classesTim Renouf2019-03-221-0/+32
| | | | | | | | | | They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-211-1/+12
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* [AMDGPU][MC][GFX9] Added support of operands shared_base, shared_limit, ↵Dmitry Preobrazhensky2019-03-201-0/+3
| | | | | | | | | | | | private_base, private_limit, pops_exiting_wave_id See bug 39297: https://bugs.llvm.org/show_bug.cgi?id=39297 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D59290 llvm-svn: 356561
* [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmeticTim Renouf2019-03-181-3/+6
| | | | | | | | | | | | | | Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399
OpenPOWER on IntegriCloud