summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEXMatt Arsenault2019-10-021-1/+7
| | | | | | | | | In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415
* AMDGPU/GlobalISel: Private loads always use VGPRsMatt Arsenault2019-10-021-4/+6
| | | | llvm-svn: 373414
* AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTORMatt Arsenault2019-10-021-4/+6
| | | | | | This will be needed to support AGPR operations. llvm-svn: 373413
* AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit valuesMatt Arsenault2019-10-021-29/+35
| | | | llvm-svn: 373412
* [AMDGPU] separate accounting for agprsStanislav Mekhanoshin2019-10-023-7/+52
| | | | | | | | | Account and report agprs separately on gfx908. Other targets do not change the reporting. Differential Revision: https://reviews.llvm.org/D68307 llvm-svn: 373411
* AMDGPU: Fix an out of date assert in addressing FrameIndexChangpeng Fang2019-10-011-3/+2
| | | | | | | | | | Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404
* AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfoTom Stellard2019-10-011-205/+244
| | | | | | | | | | | | | | | | | | Summary: This is a refactoring that will make future improvements to this pass easier. This change should not change the behavior of the pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65496 llvm-svn: 373366
* AMDGPU/GlobalISel: Increase max legal size to 1024Matt Arsenault2019-10-013-10/+13
| | | | | | | | There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350
* [AMDGPU] Add VerifyScheduling support.Jay Foad2019-10-011-0/+18
| | | | | | | | | | | | | | | | Summary: This is cut and pasted from the corresponding GenericScheduler functions. Reviewers: arsenm, atrick, tstellar, vpykhtin Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68264 llvm-svn: 373346
* AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFPMatt Arsenault2019-10-014-8/+55
| | | | llvm-svn: 373298
* AMDGPU/GlobalISel: Add support for init.exec intrinsicsMatt Arsenault2019-10-016-20/+42
| | | | | | | TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296
* AMDGPU/GlobalISel: Allow scc/vcc alternative mappings for s1 constantsMatt Arsenault2019-10-011-1/+15
| | | | llvm-svn: 373295
* AMDGPU/GlobalISel: Avoid creating shift of 0 in arg loweringMatt Arsenault2019-10-011-3/+8
| | | | | | | | This is sort of papering over the fact that we don't run a combiner anywhere, but avoiding creating 2 instructions in the first place is easy. llvm-svn: 373293
* TLI: Remove DAG argument from getRegisterByNameMatt Arsenault2019-10-012-6/+6
| | | | | | | | | | | Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292
* AMDGPU/GlobalISel: Select G_UADDO/G_USUBOMatt Arsenault2019-10-013-1/+47
| | | | llvm-svn: 373288
* GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sourcesMatt Arsenault2019-10-011-3/+9
| | | | | | Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287
* AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUEMatt Arsenault2019-10-013-8/+105
| | | | | | | Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286
* [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definitionAlexander Timofeev2019-09-301-0/+7
| | | | | | | | Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67662 llvm-svn: 373221
* [Alignment][NFC] Remove AllocaInst::setAlignment(unsigned)Guillaume Chatelet2019-09-301-2/+2
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68141 llvm-svn: 373207
* AMDGPU/GlobalISel: Fix select for v2s16 and/or/xorMatt Arsenault2019-09-301-15/+17
| | | | llvm-svn: 373180
* AMDGPU/GlobalISel: Avoid getting MRI in every functionMatt Arsenault2019-09-282-221/+156
| | | | | | | Store it in AMDGPUInstructionSelector to avoid boilerplate in nearly every select function. llvm-svn: 373139
* [AMDGPU][MC] Corrected parsing of registersDmitry Preobrazhensky2019-09-271-144/+208
| | | | | | | | | | | | | Summary of changes: refactored code for better readability and future improvements; fixed bug 41281: https://bugs.llvm.org/show_bug.cgi?id=41281 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D65224 llvm-svn: 373094
* [Alignment][NFC] Remove unneeded llvm:: scoping on Align typesGuillaume Chatelet2019-09-274-7/+7
| | | | llvm-svn: 373081
* Remove the AliasAnalysis argument in function areMemAccessesTriviallyDisjointChangpeng Fang2019-09-262-4/+2
| | | | | | | | | | Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D58360 llvm-svn: 373024
* [AMDGPU] copy OtherPredicates from pseudo to VOP3_RealStanislav Mekhanoshin2019-09-261-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D68102 llvm-svn: 373015
* [TargetLowering] Make allowsMemoryAccess methode virtual.Thomas Raoux2019-09-262-6/+7
| | | | | | | | | | | Rename old function to explicitly show that it cares only about alignment. The new allowsMemoryAccess call the function related to alignment by default and can be overridden by target to inform whether the memory access is legal or not. Differential Revision: https://reviews.llvm.org/D67121 llvm-svn: 372935
* [AMDGPU] gfx10 v_fmac_f16 operand foldingStanislav Mekhanoshin2019-09-251-8/+15
| | | | | | | | Fold immediates into v_fmac_f16. Differential Revision: https://reviews.llvm.org/D68037 llvm-svn: 372906
* [TargetInstrInfo] Let findCommutedOpIndices take const MachineInstr&Simon Pilgrim2019-09-252-2/+3
| | | | | | | | | | Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future. Committed on behalf of @hvdijk (Harald van Dijk) Differential Revision: https://reviews.llvm.org/D66138 llvm-svn: 372882
* [Dominators][AMDGPU] Don't use virtual exit node in ↵Jakub Kuderski2019-09-251-10/+10
| | | | | | | | | | | | | | | | | | | | | | | findNearestCommonDominator. Cleanup MachinePostDominators. Summary: This patch fixes a bug that originated from passing a virtual exit block (nullptr) to `MachinePostDominatorTee::findNearestCommonDominator` and resulted in assertion failures inside its callee. It also applies a small cleanup to the class. The patch introduces a new function in PDT that given a list of `MachineBasicBlock`s finds their NCD. The new overload of `findNearestCommonDominator` handles virtual root correctly. Note that similar handling of virtual root nodes is not necessary in (forward) `DominatorTree`s, as right now they don't use virtual roots. Reviewers: tstellar, tpr, nhaehnle, arsenm, NutshellySima, grosser, hliao Reviewed By: hliao Subscribers: hliao, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #amdgpu, #llvm Differential Revision: https://reviews.llvm.org/D67974 llvm-svn: 372874
* Add tracing in pickNodeFromQueue.Jay Foad2019-09-251-0/+1
| | | | | | | This matches GenericScheduler::pickNodeFromQueue, from which this function was mostly cut and pasted. llvm-svn: 372829
* [NFC] Add { } to silence compiler warning [-Wmissing-braces].Huihui Zhang2019-09-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | /local/mnt/workspace/huihuiz/llvm-comm-git-2/llvm-project/llvm/lib/Object/MachOObjectFile.cpp:2731:7: warning: suggest braces around initialization of subobject [-Wmissing-braces] "i386", "x86_64", "x86_64h", "armv4t", "arm", "armv5e", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ { 1 warning generated. /local/mnt/workspace/huihuiz/llvm-comm-git-2/llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:46: warning: suggest braces around initialization of subobject [-Wmissing-braces] return addMappingFromTable<1>(MI, MRI, { 0 }, Table); ^ {} 1 warning generated. /local/mnt/workspace/huihuiz/llvm-comm-git-2/llvm-project/llvm/tools/llvm-objcopy/ELF/Object.cpp:400:57: warning: suggest braces around initialization of subobject [-Wmissing-braces] static constexpr std::array<uint8_t, 4> ZlibGnuMagic = {'Z', 'L', 'I', 'B'}; ^~~~~~~~~~~~~~~~~~ { } 1 warning generated. llvm-svn: 372811
* [AMDGPU][MC] Corrected handling of relocatable expressionsDmitry Preobrazhensky2019-09-231-11/+20
| | | | | | | | | | See bug 43359: https://bugs.llvm.org//show_bug.cgi?id=43359 Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67829 llvm-svn: 372622
* [AMDGPU] isSDNodeAlwaysUniform - silence static analyzer ↵Simon Pilgrim2019-09-221-3/+2
| | | | | | | | dyn_cast<LoadSDNode> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<LoadSDNode> directly and if not assert will fire for us. llvm-svn: 372528
* AMDGPUPrintfRuntimeBinding - silence static analyzer null dereference ↵Simon Pilgrim2019-09-221-3/+2
| | | | | | warnings. NFCI. llvm-svn: 372501
* AMDGPU/GlobalISel: Allow selection of scalar min/maxMatt Arsenault2019-09-211-4/+4
| | | | | | | | | I believe all of the uniform/divergent pattern predicates are redundant and can be removed. The uniformity bit already influences the register class, and nothhing has broken when I've removed this and others. llvm-svn: 372450
* Use llvm::StringLiteral instead of StringRef in few placesFangrui Song2019-09-202-19/+7
| | | | llvm-svn: 372395
* [AMDGPU] Use std::make_tuple to make some toolchains happy againBjorn Pettersson2019-09-201-6/+6
| | | | | | | | | | | | | | | | | | | | | My toolchain stopped working (LLVM 8.0 , libstdc++ 5.4.0) after r372338. The same problem was seen in clang-cuda-build buildbots: clang-cuda-build/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:763:12: error: chosen constructor is explicit in copy-initialization return {Reg, 0, nullptr}; ^~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements) ^ This commit adds explicit calls to std::make_tuple to work around the problem. llvm-svn: 372384
* [AMDGPU] fixed underflow in getOccupancyWithNumVGPRsStanislav Mekhanoshin2019-09-191-1/+1
| | | | | | | | | The function could return zero if an extreme number or registers were used. Minimal possible occupancy is 1. Differential Revision: https://reviews.llvm.org/D67771 llvm-svn: 372350
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-1915-188/+912
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-1915-912/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* AMDGPU/SILoadStoreOptimizer: Add const to more functionsTom Stellard2019-09-191-12/+12
| | | | | | | | | | | | Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65901 llvm-svn: 372298
* AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.ds.swizzleMatt Arsenault2019-09-191-0/+1
| | | | llvm-svn: 372297
* AMDGPU/GlobalISel: RegBankSelect tbuffer load/storeMatt Arsenault2019-09-191-6/+14
| | | | | | These have the same operand structure as the non-t buffer operations. llvm-svn: 372296
* AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.formatMatt Arsenault2019-09-193-20/+122
| | | | | | | | | This needs special handling due to some subtargets that have a nonstandard register layout for f16 vectors Also reject some illegal types on other targets. llvm-svn: 372293
* AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.storeMatt Arsenault2019-09-197-11/+455
| | | | llvm-svn: 372292
* AMDGPU/GlobalISel: RegBankSelect struct buffer load/storeMatt Arsenault2019-09-191-0/+22
| | | | llvm-svn: 372291
* AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.raw.buffer.{load|store}Matt Arsenault2019-09-192-1/+52
| | | | llvm-svn: 372290
* AMDGPU/GlobalISel: Attempt to RegBankSelect image intrinsicsMatt Arsenault2019-09-192-5/+104
| | | | | | Images should always have 2 consecutive, mandatory SGPR arguments. llvm-svn: 372289
* AMDGPU/GlobalISel: Fix RegBankSelect G_SMULH/G_UMULH pre-gfx9Matt Arsenault2019-09-192-3/+11
| | | | | | The scalar versions were only introduced in gfx9. llvm-svn: 372286
* GlobalISel: Don't materialize immarg arguments to intrinsicsMatt Arsenault2019-09-199-171/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encode them directly as an imm argument to G_INTRINSIC*. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_* instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
OpenPOWER on IntegriCloud