summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Fix branch targets when emitting SI_IFMatt Arsenault2020-01-131-7/+30
| | | | | | | | The branch target needs to be changed depending on whether there is an unconditional branch or not. Loops also need to be similarly fixed, but compiling a simple testcase end to end requires another set of patches that aren't upstream yet.
* AMDGPU/GlobalISel: Clamp G_ZEXT source sizesMatt Arsenault2020-01-101-2/+3
| | | | | Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so fewer cases actually work.
* GlobalISel: Handle llvm.read_registerMatt Arsenault2020-01-091-0/+2
| | | | | Compared to the attempt in bdcc6d3d2638b3a2c99ab3b9bfaa9c02e584993a, this uses intermediate generic instructions.
* AMDGPU/GlobalISel: Widen 16-bit shift amount sourcesMatt Arsenault2020-01-091-1/+2
| | | | | | This should be legal, but will require future selection work. 16-bit shift amounts were already removed from being legal, but this didn't adjust the transformation rules.
* AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointersMatt Arsenault2020-01-071-1/+1
| | | | | 4e85ca9562a588eba491e44bcbf73cb2f419780f missed updating the legal condition type set for pointers with any unrecognized address space.
* AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTERMatt Arsenault2020-01-061-0/+3
|
* AMDGPU/GlobalISel: Replace handling of boolean valuesMatt Arsenault2020-01-061-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This solves selection failures with generated selection patterns, which would fail due to inferring the SGPR reg bank for virtual registers with a set register class instead of VCC bank. Use instruction selection would constrain the virtual register to a specific class, so when the def was selected later the bank no longer was set to VCC. Remove the SCC reg bank. SCC isn't directly addressable, so it requires copying from SCC to an allocatable 32-bit register during selection, so these might as well be treated as 32-bit SGPR values. Now any scalar boolean value that will produce an outupt in SCC should be widened during RegBankSelect to s32. Any s1 value should be a vector boolean during selection. This makes the vcc register bank unambiguous with a normal SGPR during selection. Summary of how this should now work: - G_TRUNC is always a no-op, and never should use a vcc bank result. - SALU boolean operations should be promoted to s32 in RegBankSelect apply mapping - An s1 value means vcc bank at selection. The exception is for legalization artifacts that use s1, which are never VCC. All other contexts should infer the VCC register classes for s1 typed registers. The LLT for the register is now needed to infer the correct register class. Extensions with vcc sources should be legalized to a select of constants during RegBankSelect. - Copy from non-vcc to vcc ensures high bits of the input value are cleared during selection. - SALU boolean inputs should ensure the inputs are 0/1. This includes select, conditional branches, and carry-ins. There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT selection ignores the usual register-bank from register class functions, and can't handle truncates with VCC result banks. I think this is OK, since the artifacts are specially treated anyway. This does require some care to avoid producing cases with vcc. There will also be no 100% reliable way to verify this rule is followed in selection in case of register classes, and violations manifests themselves as invalid copy instructions much later. Standard phi handling also only considers the bank of the result register, and doesn't insert copies to make the source banks match. This doesn't work for vcc, so we have to manually correct phi inputs in this case. We should add a verifier check to make sure there are no phis with mixed vcc and non-vcc register bank inputs. There's also some duplication with the LegalizerHelper, and some code which should live in the helper. I don't see a good way to share special knowledge about what types to use for intermediate operations depending on the bank for example. Using the helper to replace extensions with selects also seems somewhat awkward to me. Another issue is there are some contexts calling getRegBankFromRegClass that apparently don't have the LLT type for the register, but I haven't yet run into a real issue from this. This also introduces new unnecessary instructions in most cases, since we don't yet try to optimize out the zext when the source is known to come from a compare.
* GlobalISel: Implement lower for G_INTRINSIC_ROUNDMatt Arsenault2020-01-061-2/+2
| | | | | Mostly copied from AMDGPU lowering implementation, except used G_SITOFP instead of directly creating a select on -1.0, 0.0.
* GlobalISel: Fix unsupported legalize actionMatt Arsenault2020-01-061-0/+5
| | | | | | | | This would complain about invalid legalizer rules otherwise. Mark some operations as unsupported for AMDGPU. This currently seems to produce the same legalize error as when no rules are defined, but eventually this should produce a proper user facing error.
* GlobalISel: Scalarize all division operationsMatt Arsenault2020-01-041-0/+7
| | | | | | This only handled G_SDIV, but they all are trivially scalarizable. Also define placeholder AMDGPU division legalizer rules.
* AMDGPU/GlobalISel: Legalize more odd sized loadsMatt Arsenault2020-01-041-5/+9
| | | | | The attempts to widen sufficently aligned, odd sized loads wasn't consistently applied.
* GlobalISel: moreElementsVector for FP min/maxMatt Arsenault2019-12-301-0/+1
|
* AMDGPU/GlobalISel: Fix extra result register in fdiv64 loweringMatt Arsenault2019-12-271-2/+1
| | | | | | There ended up being two result registers, which would fail on select. It was really defing a new temp register in the correct def position, instead of the correct result register.
* AMDGPU/GlobalISel: Legalize some 16-bit round instructionsMatt Arsenault2019-12-241-1/+6
|
* AMDGPU/GlobalISel: Lower llvm.amdgcn.elseMatt Arsenault2019-12-241-6/+17
|
* [AMDGPU] Don't create MachinePointerInfos with an UndefValue pointerJay Foad2019-12-231-6/+2
| | | | | | | | | | | | | | | Summary: The only useful information the UndefValue conveys is the address space, which MachinePointerInfo can represent directly without referring to an IR value. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71838
* AMDGPU/GlobalISel: Fix misuse of div_scale intrinsicsMatt Arsenault2019-12-211-5/+5
| | | | | | | | | | | Confusingly, the intrinsic operands do not match the instruction/custom node. The order is shuffled, and the 3rd operand is an immediate to select operands. I'm not 100% sure I did this right, but fdiv still doesn't select end to end and it will be easier to tell when it does. This at least avoids an assertion in RegBankSelect and allows hitting the fallback on selection.
* AMDGPU/GlobalISel: Legalize FDIV64Austin Kerbow2019-11-191-0/+85
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70403
* AMDGPU: Be explicit about denormal mode in MIR testsMatt Arsenault2019-11-191-10/+16
| | | | | | | Start checking the machine function in GlobalISel instead of the target directly. This temporarily breaks fcanonicalize selection in GlobalISel.
* GlobalISel: Lower s1 source G_SITOFP/G_UITOFPMatt Arsenault2019-11-151-1/+2
|
* [globalisel] Rename G_GEP to G_PTR_ADDDaniel Sanders2019-11-051-3/+3
| | | | | | | | | | | | | | | | Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734
* AMDGPU/GlobalISel: Legalize FDIV32Austin Kerbow2019-10-291-0/+99
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69581
* AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHGMatt Arsenault2019-10-251-1/+38
| | | | | | | | Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.
* AMDGPU/GlobalISel: Legalize FDIV16Austin Kerbow2019-10-251-0/+39
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69347
* AMDGPU/GlobalISel: Legalize fast unsafe FDIVAustin Kerbow2019-10-211-4/+84
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69231 llvm-svn: 375460
* GlobalISel: Implement lower for G_SADDO/G_SSUBOMatt Arsenault2019-10-161-1/+4
| | | | | | | Port directly from SelectionDAG, minus the path using ISD::SADDSAT/ISD::SSUBSAT. llvm-svn: 375042
* GlobalISel: Implement fewerElementsVector for G_BUILD_VECTORMatt Arsenault2019-10-091-1/+10
| | | | | | Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252
* AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sourcesMatt Arsenault2019-10-071-3/+6
| | | | llvm-svn: 373989
* GlobalISel: Partially implement lower for G_INSERTMatt Arsenault2019-10-071-7/+3
| | | | llvm-svn: 373946
* AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sourcesMatt Arsenault2019-10-071-18/+29
| | | | | | Continue making a mess of merge/unmerge legality. llvm-svn: 373942
* AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESSMatt Arsenault2019-10-061-0/+3
| | | | llvm-svn: 373839
* GlobalISel: Partially implement lower for G_EXTRACTMatt Arsenault2019-10-061-1/+13
| | | | | | Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838
* AMDGPU/GlobalISel: Fix using wrong addrspace for apertureMatt Arsenault2019-10-041-1/+3
| | | | | | | This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716
* AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 andMatt Arsenault2019-10-031-2/+3
| | | | | | This would try to do FewerElements to v9s8 llvm-svn: 373635
* AMDGPU/GlobalISel: Expand G_BITCAST legalityMatt Arsenault2019-10-031-4/+1
| | | | llvm-svn: 373567
* AMDGPU/GlobalISel: Use getIntrinsicID helperMatt Arsenault2019-10-021-1/+1
| | | | llvm-svn: 373417
* AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTORMatt Arsenault2019-10-021-4/+6
| | | | | | This will be needed to support AGPR operations. llvm-svn: 373413
* AMDGPU/GlobalISel: Increase max legal size to 1024Matt Arsenault2019-10-011-8/+8
| | | | | | | | There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350
* AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFPMatt Arsenault2019-10-011-2/+2
| | | | llvm-svn: 373298
* AMDGPU/GlobalISel: Avoid creating shift of 0 in arg loweringMatt Arsenault2019-10-011-3/+8
| | | | | | | | This is sort of papering over the fact that we don't run a combiner anywhere, but avoiding creating 2 instructions in the first place is easy. llvm-svn: 373293
* AMDGPU/GlobalISel: Select G_UADDO/G_USUBOMatt Arsenault2019-10-011-1/+2
| | | | llvm-svn: 373288
* GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sourcesMatt Arsenault2019-10-011-3/+9
| | | | | | Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287
* AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUEMatt Arsenault2019-10-011-8/+98
| | | | | | | Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-0/+60
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-60/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.formatMatt Arsenault2019-09-191-3/+29
| | | | | | | | | This needs special handling due to some subtargets that have a nonstandard register layout for f16 vectors Also reject some illegal types on other targets. llvm-svn: 372293
* AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.storeMatt Arsenault2019-09-191-0/+34
| | | | llvm-svn: 372292
* AMDGPU/GlobalISel: Legalize s1 source G_[SU]ITOFPMatt Arsenault2019-09-161-1/+2
| | | | llvm-svn: 371952
* AMDGPU/GlobalISel: Select S16->S32 fptointMatt Arsenault2019-09-161-1/+1
| | | | llvm-svn: 371950
* AMDGPU/GlobalISel: Legalize s32->s16 G_SITOFP/G_UITOFPMatt Arsenault2019-09-131-1/+1
| | | | llvm-svn: 371811
OpenPOWER on IntegriCloud