summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Relax 32-bit SGPR register classMatt Arsenault2019-10-181-8/+9
| | | | | | | | | | | Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267
* AMDGPU/GlobalISel: Handle more G_INSERT casesMatt Arsenault2019-10-071-43/+1
| | | | | | | | | Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947
* AMDGPU/GlobalISel: Use S_MOV_B64 for inline constantsMatt Arsenault2019-10-071-20/+27
| | | | | | | This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943
* AMDGPU/GlobalISel: Select more G_INSERT casesMatt Arsenault2019-10-071-20/+78
| | | | | | | | | | At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938
* GlobalISel: Add target pre-isel instructionsMatt Arsenault2019-10-071-1/+1
| | | | | | | | | | | | | | Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937
* AMDGPU/GlobalISel: Fall back on weird G_EXTRACT offsetsMatt Arsenault2019-10-061-2/+5
| | | | llvm-svn: 373842
* AMDGPU/GlobalISel: Select G_PTRTOINTMatt Arsenault2019-10-041-0/+1
| | | | llvm-svn: 373715
* [AMDGPU] Extend buffer intrinsics with swizzlingPiotr Sobczak2019-10-021-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
* AMDGPU/GlobalISel: Use getIntrinsicID helperMatt Arsenault2019-10-021-1/+1
| | | | llvm-svn: 373417
* AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFPMatt Arsenault2019-10-011-0/+46
| | | | llvm-svn: 373298
* AMDGPU/GlobalISel: Add support for init.exec intrinsicsMatt Arsenault2019-10-011-0/+9
| | | | | | | TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296
* AMDGPU/GlobalISel: Select G_UADDO/G_USUBOMatt Arsenault2019-10-011-0/+44
| | | | llvm-svn: 373288
* AMDGPU/GlobalISel: Avoid getting MRI in every functionMatt Arsenault2019-09-281-220/+149
| | | | | | | Store it in AMDGPUInstructionSelector to avoid boilerplate in nearly every select function. llvm-svn: 373139
* [AMDGPU] Use std::make_tuple to make some toolchains happy againBjorn Pettersson2019-09-201-6/+6
| | | | | | | | | | | | | | | | | | | | | My toolchain stopped working (LLVM 8.0 , libstdc++ 5.4.0) after r372338. The same problem was seen in clang-cuda-build buildbots: clang-cuda-build/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:763:12: error: chosen constructor is explicit in copy-initialization return {Reg, 0, nullptr}; ^~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements) ^ This commit adds explicit calls to std::make_tuple to work around the problem. llvm-svn: 372384
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-12/+267
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-267/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.formatMatt Arsenault2019-09-191-17/+91
| | | | | | | | | This needs special handling due to some subtargets that have a nonstandard register layout for f16 vectors Also reject some illegal types on other targets. llvm-svn: 372293
* AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.storeMatt Arsenault2019-09-191-0/+185
| | | | llvm-svn: 372292
* GlobalISel: Don't materialize immarg arguments to intrinsicsMatt Arsenault2019-09-191-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encode them directly as an imm argument to G_INTRINSIC*. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_* instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
* AMDGPU/GlobalISel: Fail select of G_INSERT non-32-bit sourceMatt Arsenault2019-09-161-4/+16
| | | | | | | This was producing an illegal copy which would hit an assert later. Error on selection for now until this is implemented. llvm-svn: 371993
* AMDGPU/GlobalISel: Fix assert on multi-return side effect intrinsicsMatt Arsenault2019-09-131-1/+1
| | | | | | llvm.amdgcn.else hits this. llvm-svn: 371812
* AMDGPU/GlobalISel: Select llvm.amdgcn.classMatt Arsenault2019-09-091-0/+18
| | | | | | Also fixes missing SubtargetPredicate on f16 class instructions. llvm-svn: 371436
* AMDGPU/GlobalISel: Select fmed3Matt Arsenault2019-09-091-0/+19
| | | | llvm-svn: 371435
* AMDGPU/GlobalISel: Select G_PTR_MASKMatt Arsenault2019-09-091-0/+65
| | | | llvm-svn: 371412
* AMDGPU/GlobalISel: Use known bits for selectionMatt Arsenault2019-09-091-8/+3
| | | | llvm-svn: 371409
* AMDGPU/GlobalISel: Try generated matcher before add/sub codeMatt Arsenault2019-09-091-4/+4
| | | | | | This will allow optimization patterns which fold adds away to work. llvm-svn: 371406
* AMDGPU/GlobalISel: Fix assert on load from constant addressMatt Arsenault2019-09-051-4/+4
| | | | llvm-svn: 371006
* AMDGPU/GlobalISel: Fix constraining scalar and/or/xorMatt Arsenault2019-08-281-8/+1
| | | | | | | If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150
* Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders2019-08-151-21/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
* [GlobalISel] Make the InstructionSelector instance non-const, allowing state ↵Amara Emerson2019-08-131-20/+16
| | | | | | | | | | | | | | | | to be maintained. Currently we can't keep any state in the selector object that we get from subtarget. As a result we have to plumb through all our variables through multiple functions. This change makes it non-const and adds a virtual init() method to allow further state to be captured for each target. AArch64 makes use of this in this patch to cache a call to hasFnAttribute() which is expensive to call, and is used on each selection of G_BRCOND. Differential Revision: https://reviews.llvm.org/D65984 llvm-svn: 368652
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-011-4/+4
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* AMDGPU/GlobalISel: Remove manual store select codeMatt Arsenault2019-08-011-50/+2
| | | | | | | This regresses the weird types that are newly treated as legal load types, but fixes incorrectly using flat instrucions on SI. llvm-svn: 367512
* AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADDMatt Arsenault2019-08-011-0/+1
| | | | llvm-svn: 367509
* AMDGPU/GlobalISel: Allow selection of DS atomicrmwMatt Arsenault2019-08-011-4/+14
| | | | llvm-svn: 367507
* AMDGPU/GlobalISel: Select simple local storesMatt Arsenault2019-08-011-11/+18
| | | | llvm-svn: 367504
* AMDGPU/GlobalISel: Select local loadsMatt Arsenault2019-08-011-5/+79
| | | | llvm-svn: 367498
* AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting extsMatt Arsenault2019-07-241-6/+8
| | | | | | | The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915
* AMDGPU/GlobalISel: Remove unnecessary codeMatt Arsenault2019-07-221-4/+0
| | | | | | | The minnum/maxnum case are dead, and the cvt is handled by the default. llvm-svn: 366685
* AMDGPU/GlobalISel: Select private loadsMatt Arsenault2019-07-161-1/+135
| | | | llvm-svn: 366248
* AMDGPU/GlobalISel: Select flat storesMatt Arsenault2019-07-161-2/+4
| | | | llvm-svn: 366246
* AMDGPU/GlobalISel: Select flat loadsMatt Arsenault2019-07-161-44/+52
| | | | | | | | Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237
* AMDGPU/GlobalISel: Fix test failures in release buildMatt Arsenault2019-07-161-2/+5
| | | | | | | | | | | | Apparently the check for legal instructions during instruction select does not happen without an asserts build, so these would successfully select in release, and fail in debug. Make s16 and/or/xor legal. These can just be selected directly to the 32-bit operation, as is already done in SelectionDAG, so just make them legal. llvm-svn: 366210
* AMDGPU/GlobalISel: Select G_AND/G_OR/G_XORMatt Arsenault2019-07-151-0/+65
| | | | llvm-svn: 366121
* AMDGPU/GlobalISel: Don't constrain source register of VCC copiesMatt Arsenault2019-07-151-0/+20
| | | | | | | | | | | | | This is a hack until I come up with a better way of dealing with the pseudo-register banks used for boolean values. If the use instruction constrains the register, the selector for the def instruction won't see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have been SCCRegBank or VCCRegBank in wave32. This is necessary to successfully select branches with and and/or/xor condition. llvm-svn: 366120
* AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copiesMatt Arsenault2019-07-151-10/+12
| | | | | | | | | The extra test change is correct, although how it arrives there is a bug that needs work. With wave32, the test for isVCC ambiguously reports true for an SCC or VCC source. A new allocatable pseudo register class for SCC may be necesssary. llvm-svn: 366119
* AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCCMatt Arsenault2019-07-151-0/+4
| | | | llvm-svn: 366118
* AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCCMatt Arsenault2019-07-151-17/+23
| | | | | | This was emitting a copy from a 32-bit register to a 64-bit. llvm-svn: 366117
* AMDGPU/GlobalISel: Fix G_ICMP for wave32Matt Arsenault2019-07-151-2/+2
| | | | llvm-svn: 366114
* AMDGPU/GlobalISel: Handle llvm.amdgcn.if.breakMatt Arsenault2019-07-151-0/+25
| | | | llvm-svn: 366102
* AMDGPU/GlobalISel: Select llvm.amdgcn.end.cfMatt Arsenault2019-07-151-0/+14
| | | | llvm-svn: 366099
OpenPOWER on IntegriCloud