summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Set insert point after waterfall loopMatt Arsenault2020-01-131-2/+3
| | | | | | | | | The current users of the waterfall loop utility functions do not make use of the restored original insert point. The insertion is either done, or they set the insert point somewhere else. A future change will want to insert instructions after the waterfall loop, but figuring out the point after the loop is more difficult than ensuring the insert point is there after the loop.
* AMDGPU/GlobalISel: Simplify assertMatt Arsenault2020-01-131-11/+3
|
* AMDGPU/GlobalISel: Copy type when inserting readfirstlaneMatt Arsenault2020-01-121-0/+2
| | | | | getDefIgnoringCopies will fail to find any def if no type is set if we try to use it on the use's operand, so propagate the type.
* AMDGPU/GlobalISel: Fix G_EXTRACT_VECTOR_ELT mapping for s-v caseMatt Arsenault2020-01-091-15/+85
| | | | | | If an SGPR vector is indexed with a VGPR, the actual indexing will be done on the SGPR and produce an SGPR. A copy needs to be inserted inside the waterwall loop to the VGPR result.
* AMDGPU/GlobalISel: Fix unused variable warning in releaseMatt Arsenault2020-01-061-2/+1
|
* AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTERMatt Arsenault2020-01-061-1/+2
|
* AMDGPU/GlobalISel: Replace handling of boolean valuesMatt Arsenault2020-01-061-93/+293
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This solves selection failures with generated selection patterns, which would fail due to inferring the SGPR reg bank for virtual registers with a set register class instead of VCC bank. Use instruction selection would constrain the virtual register to a specific class, so when the def was selected later the bank no longer was set to VCC. Remove the SCC reg bank. SCC isn't directly addressable, so it requires copying from SCC to an allocatable 32-bit register during selection, so these might as well be treated as 32-bit SGPR values. Now any scalar boolean value that will produce an outupt in SCC should be widened during RegBankSelect to s32. Any s1 value should be a vector boolean during selection. This makes the vcc register bank unambiguous with a normal SGPR during selection. Summary of how this should now work: - G_TRUNC is always a no-op, and never should use a vcc bank result. - SALU boolean operations should be promoted to s32 in RegBankSelect apply mapping - An s1 value means vcc bank at selection. The exception is for legalization artifacts that use s1, which are never VCC. All other contexts should infer the VCC register classes for s1 typed registers. The LLT for the register is now needed to infer the correct register class. Extensions with vcc sources should be legalized to a select of constants during RegBankSelect. - Copy from non-vcc to vcc ensures high bits of the input value are cleared during selection. - SALU boolean inputs should ensure the inputs are 0/1. This includes select, conditional branches, and carry-ins. There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT selection ignores the usual register-bank from register class functions, and can't handle truncates with VCC result banks. I think this is OK, since the artifacts are specially treated anyway. This does require some care to avoid producing cases with vcc. There will also be no 100% reliable way to verify this rule is followed in selection in case of register classes, and violations manifests themselves as invalid copy instructions much later. Standard phi handling also only considers the bank of the result register, and doesn't insert copies to make the source banks match. This doesn't work for vcc, so we have to manually correct phi inputs in this case. We should add a verifier check to make sure there are no phis with mixed vcc and non-vcc register bank inputs. There's also some duplication with the LegalizerHelper, and some code which should live in the helper. I don't see a good way to share special knowledge about what types to use for intermediate operations depending on the bank for example. Using the helper to replace extensions with selects also seems somewhat awkward to me. Another issue is there are some contexts calling getRegBankFromRegClass that apparently don't have the LLT type for the register, but I haven't yet run into a real issue from this. This also introduces new unnecessary instructions in most cases, since we don't yet try to optimize out the zext when the source is known to come from a compare.
* GlobalISel: Implement lower for G_INTRINSIC_ROUNDMatt Arsenault2020-01-061-1/+0
| | | | | Mostly copied from AMDGPU lowering implementation, except used G_SITOFP instead of directly creating a select on -1.0, 0.0.
* AMDGPU/GlobalISel: Refine SMRD selection rulesMatt Arsenault2020-01-041-4/+22
| | | | | Fix selecting these for volatile global loads, and ensure the loads are constant enough.
* AMDGPU/GlobalISel: Assume vcc phis for any vcc inputMatt Arsenault2020-01-041-2/+3
| | | | | | | This produces more intelligible looking results, more comparabble to the DAG output in the simplest cases. This is probably wrong in complex control flow, but RegBankSelect doesn't attempt analyzing if this is on a masked path for selecting the bank yet.
* AMDGPU/GlobalISel: Implement applyMappingImpl less incorrectlyMatt Arsenault2020-01-041-13/+23
| | | | | | | | | | | We're checking the current register bank of the registers in the instruction, but the mapping may have inserted cross bank copies and is expecting to replace the registers. We mostly get away with this currently, because VGPR->SGPR copies are illegal, and we assume this won't happen. In a future change, we'll start relying on more cross register bank copies being inserted, and this starts to break down.
* GlobalISel: Add type argument to getRegBankFromRegClassMatt Arsenault2020-01-031-2/+3
| | | | | | AMDGPU can't unambiguously go back from the selected instruction register class to the register bank without knowing if this was used in a boolean context.
* AMDGPU/GlobalISel: Account for G_PHI result bankMatt Arsenault2019-12-301-13/+23
| | | | | | | | | Sometimes the result bank of the phi is already assigned to something, and should not be ignored. This is in preparation for additional boolean phi handling changes. Also refine the logic to fix some cases that were incorrectly deciding to use SGPRs.
* AMDGPU/GlobalISel: Use SReg_32 for readfirstlane constrainingMatt Arsenault2019-12-271-1/+1
| | | | | This matches the DAG behavior where we don't use SReg_32_XM0 everywhere anymore, and fixes not coalescing the copies into m0.
* AMDGPU/GlobalISel: Fix mapping and selection of llvm.amdgcn.div.fixupMatt Arsenault2019-12-241-0/+1
|
* AMDGPU/GlobalISel: Simplify codeMatt Arsenault2019-12-211-5/+5
| | | | | This can directly access the register bank, and doesn't need to get it through the ID.
* AMDGPU/GlobalISel: Add AGPR bank and RegBankSelect mfma intrinsicsAustin Kerbow2019-12-011-6/+63
| | | | Differential Revision: https://reviews.llvm.org/D70871
* [globalisel] Rename G_GEP to G_PTR_ADDDaniel Sanders2019-11-051-1/+1
| | | | | | | | | | | | | | | | Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734
* AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHGMatt Arsenault2019-10-251-1/+2
| | | | | | | | Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.
* GlobalISel: Implement lower for G_SADDO/G_SSUBOMatt Arsenault2019-10-161-2/+0
| | | | | | | Port directly from SelectionDAG, minus the path using ISD::SADDSAT/ISD::SSUBSAT. llvm-svn: 375042
* AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointerMatt Arsenault2019-10-091-4/+14
| | | | | | | | | | This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255
* GlobalISel: Add target pre-isel instructionsMatt Arsenault2019-10-071-0/+1
| | | | | | | | | | | | | | Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937
* Second attempt to add iterator_range::empty()Jordan Rose2019-10-071-11/+11
| | | | | | | | | | | | Doing this makes MSVC complain that `empty(someRange)` could refer to either C++17's std::empty or LLVM's llvm::empty, which previously we avoided via SFINAE because std::empty is defined in terms of an empty member rather than begin and end. So, switch callers over to the new method as it is added. https://reviews.llvm.org/D68439 llvm-svn: 373935
* AMDGPU/GlobalISel: RegBankSelect mul24 intrinsicsMatt Arsenault2019-10-061-0/+2
| | | | llvm-svn: 373841
* AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsicsMatt Arsenault2019-10-061-0/+35
| | | | llvm-svn: 373840
* AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsicsMatt Arsenault2019-10-061-6/+5
| | | | | | This wasn't updated for the immarg handling change. llvm-svn: 373837
* [NFC] Add { } to silence compiler warning [-Wmissing-braces].Huihui Zhang2019-10-041-1/+1
| | | | | | | | | ../llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:48: warning: suggest braces around initialization of subobject [-Wmissing-braces] return addMappingFromTable<1>(MI, MRI, { 0 }, Table); ^ {} llvm-svn: 373784
* AMDGPU/GlobalISel: Support wave32 waterfall loopsMatt Arsenault2019-10-041-22/+30
| | | | llvm-svn: 373714
* [NFC] Fix unused variable in release buildsJordan Rupprecht2019-10-031-1/+2
| | | | llvm-svn: 373646
* AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELTMatt Arsenault2019-10-031-5/+77
| | | | llvm-svn: 373639
* AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelectMatt Arsenault2019-10-031-167/+257
| | | | | | | | Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638
* AMDGPU/GlobalISel: Allow VGPR to index SGPR registerMatt Arsenault2019-10-031-4/+6
| | | | | | | | We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637
* AMDGPU/GlobalISel: Don't re-get subtargetMatt Arsenault2019-10-031-6/+3
| | | | | | It's already available in the class. llvm-svn: 373568
* AMDGPU/GlobalISel: Use getIntrinsicID helperMatt Arsenault2019-10-021-5/+5
| | | | llvm-svn: 373417
* AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEXMatt Arsenault2019-10-021-1/+7
| | | | | | | | | In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415
* AMDGPU/GlobalISel: Private loads always use VGPRsMatt Arsenault2019-10-021-4/+6
| | | | llvm-svn: 373414
* AMDGPU/GlobalISel: Add support for init.exec intrinsicsMatt Arsenault2019-10-011-1/+8
| | | | | | | TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296
* AMDGPU/GlobalISel: Allow scc/vcc alternative mappings for s1 constantsMatt Arsenault2019-10-011-1/+15
| | | | llvm-svn: 373295
* [NFC] Add { } to silence compiler warning [-Wmissing-braces].Huihui Zhang2019-09-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | /local/mnt/workspace/huihuiz/llvm-comm-git-2/llvm-project/llvm/lib/Object/MachOObjectFile.cpp:2731:7: warning: suggest braces around initialization of subobject [-Wmissing-braces] "i386", "x86_64", "x86_64h", "armv4t", "arm", "armv5e", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ { 1 warning generated. /local/mnt/workspace/huihuiz/llvm-comm-git-2/llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:46: warning: suggest braces around initialization of subobject [-Wmissing-braces] return addMappingFromTable<1>(MI, MRI, { 0 }, Table); ^ {} 1 warning generated. /local/mnt/workspace/huihuiz/llvm-comm-git-2/llvm-project/llvm/tools/llvm-objcopy/ELF/Object.cpp:400:57: warning: suggest braces around initialization of subobject [-Wmissing-braces] static constexpr std::array<uint8_t, 4> ZlibGnuMagic = {'Z', 'L', 'I', 'B'}; ^~~~~~~~~~~~~~~~~~ { } 1 warning generated. llvm-svn: 372811
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-21/+378
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-378/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.ds.swizzleMatt Arsenault2019-09-191-0/+1
| | | | llvm-svn: 372297
* AMDGPU/GlobalISel: RegBankSelect tbuffer load/storeMatt Arsenault2019-09-191-6/+14
| | | | | | These have the same operand structure as the non-t buffer operations. llvm-svn: 372296
* AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.storeMatt Arsenault2019-09-191-8/+206
| | | | llvm-svn: 372292
* AMDGPU/GlobalISel: RegBankSelect struct buffer load/storeMatt Arsenault2019-09-191-0/+22
| | | | llvm-svn: 372291
* AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.raw.buffer.{load|store}Matt Arsenault2019-09-191-1/+42
| | | | llvm-svn: 372290
* AMDGPU/GlobalISel: Attempt to RegBankSelect image intrinsicsMatt Arsenault2019-09-191-5/+95
| | | | | | Images should always have 2 consecutive, mandatory SGPR arguments. llvm-svn: 372289
* AMDGPU/GlobalISel: Fix RegBankSelect G_SMULH/G_UMULH pre-gfx9Matt Arsenault2019-09-191-3/+7
| | | | | | The scalar versions were only introduced in gfx9. llvm-svn: 372286
* GlobalISel: Don't materialize immarg arguments to intrinsicsMatt Arsenault2019-09-191-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encode them directly as an imm argument to G_INTRINSIC*. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_* instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
* AMDGPU/GlobalISel: Fix RegBankSelect for G_FRINT and G_FCEILMatt Arsenault2019-09-161-0/+2
| | | | llvm-svn: 371991
OpenPOWER on IntegriCloud