summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scaleMatt Arsenault2019-06-181-0/+14
| | | | llvm-svn: 363667
* [AMDGPU] Speed up live-in virtual register set computaion in ↵Valery Pykhtin2019-06-184-5/+80
| | | | | | | | GCNScheduleDAGMILive. Differential revision: https://reviews.llvm.org/D62401 llvm-svn: 363661
* [AMDGPU] Use custom inserter for gfx10 VOP2bStanislav Mekhanoshin2019-06-171-1/+3
| | | | | | | | | This is part of the approved D63204 pending parent revision. This small change is in fact a part of the VOP2b legalization which does not technically belong to wave32 support, so extracted separately. llvm-svn: 363625
* [AMDGPU] Propagate function attributes thru bitcastsStanislav Mekhanoshin2019-06-171-3/+4
| | | | | | | | | AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614
* AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printerNicolai Haehnle2019-06-171-1/+7
| | | | | | | | | | | | | | | | | | | | | | Summary: The purpose of the padding is to guard against stale code being fetched into the instruction cache by the lowest level prefetching. We're generating relocatable ELF here, and so the padding should arguably be added by the linker. This is in fact what Mesa does. This also fixes multi-part shaders for Mesa. Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82 Reviewers: arsenm, rampitec, t-tye Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63427 llvm-svn: 363602
* [AMDGPU] gfx1010 wavefrontsize intrinsic foldingStanislav Mekhanoshin2019-06-173-16/+59
| | | | | | Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588
* [AMDGPU] Pass to propagate ABI attributes from kernels to the functionsStanislav Mekhanoshin2019-06-174-4/+356
| | | | | | | | | | | | | | | | The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586
* GlobalISel: Verify intrinsicsMatt Arsenault2019-06-171-26/+34
| | | | | | | | I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579
* AMDGPU/GlobalISel: Account for multiple defs when finding intrinsic IDMatt Arsenault2019-06-171-2/+1
| | | | llvm-svn: 363578
* [AMDGPU] gfx1010 wave32 metadataStanislav Mekhanoshin2019-06-178-2/+85
| | | | | | Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577
* AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECTTom Stellard2019-06-173-1/+195
| | | | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576
* AMDGPU: Ignore subtarget for InferAddressSpacesMatt Arsenault2019-06-171-2/+1
| | | | | | | Even if the target doesn't have flat instructions, addrspace(0) is still flat. It just happens to not work. llvm-svn: 363561
* AMDGPU/GlobalISel: Fix default mapping for non-register operandsMatt Arsenault2019-06-171-1/+5
| | | | | | Tests will be in future commits when new intrinsics are handled here. llvm-svn: 363559
* AMDGPU: Cleanup custom PseudoSourceValue definitionsMatt Arsenault2019-06-171-16/+23
| | | | | | | Use separate enums for each kind, avoid repeating overloads, and add missing classof implementation. llvm-svn: 363558
* Fix clang -Wcovered-switch-default after stack-id change by D60137Fangrui Song2019-06-171-8/+7
| | | | llvm-svn: 363543
* Describe stack-id as an enumSander de Smalen2019-06-174-10/+16
| | | | | | | | | | | | | | | | | This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533
* AMDGPU: Prepare for explicit absolute relocations in code generationNicolai Haehnle2019-06-164-8/+24
| | | | | | | | | | | | | | | | | Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517
* AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0Nicolai Haehnle2019-06-163-10/+19
| | | | | | | | | | | | | | | | | | | | | Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516
* AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsicNicolai Haehnle2019-06-164-13/+22
| | | | | | | | | | | | | | | Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514
* [AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin2019-06-1618-211/+599
| | | | | | | | | This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
* Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Matt Arsenault2019-06-151-3/+71
| | | | | | | This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478
* Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Mitch Phillips2019-06-141-71/+3
| | | | | | | | | | | This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit c2864c0de07efb5451d32d27a7d4ff2984830929. This reverts rL363410. llvm-svn: 363476
* AMDGPU: Avoid most waitcnts before callsMatt Arsenault2019-06-141-66/+90
| | | | | | | | | | | Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465
* AMDGPU: Fix dropping memref for ds append/consumeMatt Arsenault2019-06-141-1/+3
| | | | | | | | | The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455
* AMDGPU: Set isTrap on S_TRAPMatt Arsenault2019-06-141-1/+4
| | | | | | | This seems to only be used for generating some kind of documentation, but might as well set it. llvm-svn: 363454
* [AMDGPU] Don't constrain callees with inlinehint from inlining on MaxBB checkValery Pykhtin2019-06-141-1/+1
| | | | | | | | | | | | | | Summary: Function bodies marked inline in an opencl source are eliminated but MaxBB check may prevent inlining them leaving undefined references. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63337 llvm-svn: 363418
* [AMDGPU] gfx1010 BoolReg definition. NFC.Stanislav Mekhanoshin2019-06-142-0/+32
| | | | | | | | | | | Earlier commit has added AMDGPUOperand::isBoolReg(). Turns out gcc issues warning about unused function since D63204 is not yet submitted. Added NFC part of D63204 to have a use of that function and mute the warning. llvm-svn: 363416
* GlobalISel: Avoid producing Illegal copies in RegBankSelectMatt Arsenault2019-06-141-3/+71
| | | | | | | | | | | | | | | | | | | | | | Avoid producing illegal register bank copies for reg_sequence and phi. The default implementation assumes it is possible to pick any operand's bank and use that for the result, introducing a copy for operands with a different bank. This does not check for illegal copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR operand requires the result to be a VGPR. The changes in getInstrMappingImpl aren't strictly necessary, since AMDGPU now just bypasses this for reg_sequence/phi. This could be replaced with an assert in case other targets run into this. It is currently responsible for producing the error for unsatisfiable copies, but this will be better served with a verifier check. For phis, for now assume any undetermined operands must be VGPRs. Eventually, this needs to be able to defer mapping these operations. This also does not yet have a way to check for whether the block is in a divergent region. llvm-svn: 363410
* AMDGPU: Fix input chain when gluing copies to m0Matt Arsenault2019-06-141-2/+5
| | | | | | | I don't think this was causing any observable issues, but was making reading the DAG dump confusing. llvm-svn: 363389
* AMDGPU: Refactor to prepare for manually selecting more intrinsicsMatt Arsenault2019-06-141-9/+19
| | | | llvm-svn: 363385
* AMDGPU: Fix printing trailing whitespace after s_endpgmMatt Arsenault2019-06-142-2/+2
| | | | llvm-svn: 363384
* AMDGPU: Fix missing constMatt Arsenault2019-06-142-2/+2
| | | | llvm-svn: 363383
* [AMDGPU] gfx1011/gfx1012 targetsStanislav Mekhanoshin2019-06-147-0/+147
| | | | | | Differential Revision: https://reviews.llvm.org/D63307 llvm-svn: 363344
* [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32Stanislav Mekhanoshin2019-06-135-24/+66
| | | | | | Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339
* [AMDGPU] gfx1010 AMDGPUSetCCOp definitionStanislav Mekhanoshin2019-06-131-1/+1
| | | | | | | It was missing from D63293 and breaks in a debug tablegen w/o this part. llvm-svn: 363323
* [AMDGPU] gfx1010 base changes for wave32Stanislav Mekhanoshin2019-06-1312-25/+209
| | | | | | Differential Revision: https://reviews.llvm.org/D63293 llvm-svn: 363299
* [AMDGPU] ImmArg and SourceOfDivergence for permlane/dppStanislav Mekhanoshin2019-06-131-0/+5
| | | | | | | | | Added missing ImmArg and SourceOfDivergence to the crosslane intrinsics. Differential Revision: https://reviews.llvm.org/D63216 llvm-svn: 363276
* [AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setregDmitry Preobrazhensky2019-06-135-107/+174
| | | | | | | | | | See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61125 llvm-svn: 363255
* [AMDGPU] gfx1010 dpp16 and dpp8Stanislav Mekhanoshin2019-06-1214-34/+585
| | | | | | Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186
* [AMDGPU] gfx1010 premlane instructionsStanislav Mekhanoshin2019-06-127-1/+142
| | | | | | Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185
* [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests ↵Simon Pilgrim2019-06-125-17/+20
| | | | | | | | | | | | | | (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179
* [TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.Simon Pilgrim2019-06-111-5/+6
| | | | | | As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048
* Revert CMake: Make most target symbols hidden by defaultTom Stellard2019-06-116-6/+6
| | | | | | | | | | | | | | | This reverts r362990 (git commit 374571301dc8e9bc9fdd1d70f86015de198673bd) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028
* AtomicExpand: Don't crash on non-0 allocaMatt Arsenault2019-06-111-0/+1
| | | | | | | This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022
* AMDGPU: Expand < 32-bit atomicsMatt Arsenault2019-06-111-0/+2
| | | | | | Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021
* CMake: Make most target symbols hidden by defaultTom Stellard2019-06-106-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 36221 nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990
* [AMDGPU] Optimize image_[load|store]_mipPiotr Sobczak2019-06-104-0/+43
| | | | | | | | | | | | | | | | | | Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957
* AMDGPU: Force skips around trapsMatt Arsenault2019-06-071-1/+1
| | | | llvm-svn: 362852
* [AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a ↵Valery Pykhtin2019-06-071-1/+15
| | | | | | | | caller function (compile time performance) Differential revision: https://reviews.llvm.org/D62917 llvm-svn: 362789
* AMDGPU: Don't count mask branch pseudo towards skip thresholdMatt Arsenault2019-06-071-10/+8
| | | | llvm-svn: 362761
OpenPOWER on IntegriCloud