summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Improve llvm.round.f64 lowering for CI+Matt Arsenault2019-12-301-1/+1
| | | | | The path already used for f16/f32 works a lot better when v_trunc_f64 is available.
* AMDGPU: Select global atomicrmw faddMatt Arsenault2019-11-061-1/+0
| | | | This only works if there is no use of the return value.
* AMDGPU: Select basic interp directly from intrinsicsMatt Arsenault2019-10-211-3/+0
| | | | llvm-svn: 375457
* AMDGPU: Move SelectFlatOffset back into AMDGPUISelDAGToDAGMatt Arsenault2019-10-111-4/+0
| | | | llvm-svn: 374495
* AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUEMatt Arsenault2019-09-091-0/+1
| | | | | | Handle the simple case that lowers to a constant. llvm-svn: 371424
* AMDGPU: Remove pointless wrapper nodes for init.exec intrinsicsMatt Arsenault2019-09-091-2/+0
| | | | llvm-svn: 371364
* AMDGPU: Remove unused custom node definitionMatt Arsenault2019-09-011-2/+0
| | | | llvm-svn: 370603
* AMDGPU: Combine directly on mul24 intrinsicsMatt Arsenault2019-08-271-0/+1
| | | | | | | | | The problem these are supposed to work around can occur before the intrinsics are lowered into the nodes. Try to directly simplify them so they are matched before the bit assert operations can be optimized out. llvm-svn: 369994
* Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-061-0/+3
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367969
* Revert "[AMDGPU] Use S_DENORM_MODE for gfx10"Dmitri Gribenko2019-08-051-3/+0
| | | | | | | This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. llvm-svn: 367904
* [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-051-0/+3
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367882
* AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}Nicolai Haehnle2019-08-051-0/+2
| | | | | | | | | | | | | | | | | Summary: Wrapping increment/decrement. These aren't exposed by many APIs... Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c Reviewers: arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65283 llvm-svn: 367821
* [AMDGPU] gfx908 atomic fadd and atomic pk_faddStanislav Mekhanoshin2019-07-111-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717
* [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into ↵Craig Topper2019-07-091-1/+2
| | | | | | | | | | | | | | | | isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549
* AMDGPU: Write LDS objects out as global symbols in code generationNicolai Haehnle2019-06-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297
* [SelectionDAG][x86] limit post-legalization store merging by typeSanjay Patel2019-06-041-1/+1
| | | | | | | | | | | The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-301-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsicsTim Renouf2019-03-221-1/+0
| | | | | | | | | | | | | | | Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-211-0/+14
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsicsRyan Taylor2019-03-191-0/+6
| | | | | | | | | | | | | | | Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465
* [TargetLowering] Add code size information on isFPImmLegal. NFCAdhemerval Zanella2019-03-181-1/+2
| | | | | | | | | | | This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-081-0/+7
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* Adjust documentation for git migration.James Y Knight2019-01-291-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes most references to the paths: llvm.org/svn/ llvm.org/git/ llvm.org/viewvc/ github.com/llvm-mirror/ github.com/llvm-project/ reviews.llvm.org/diffusion/ to instead point to https://github.com/llvm/llvm-project. This is *not* a trivial substitution, because additionally, all the checkout instructions had to be migrated to instruct users on how to use the monorepo layout, setting LLVM_ENABLE_PROJECTS instead of checking out various projects into various subdirectories. I've attempted to not change any scripts here, only documentation. The scripts will have to be addressed separately. Additionally, I've deleted one document which appeared to be outdated and unneeded: lldb/docs/building-with-debug-llvm.txt Differential Revision: https://reviews.llvm.org/D57330 llvm-svn: 352514
* [AMDGPU] Add intrinsics for 16 bit interpolationTim Corringham2019-01-281-0/+3
| | | | | | | | | | | | | | | | | | | Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357
* Codegen support for atomicrmw fadd/fsubMatt Arsenault2019-01-221-1/+0
| | | | llvm-svn: 351851
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Add llvm.amdgcn.ds.ordered.add & swapMarek Olsak2019-01-161-0/+1
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-0/+1
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* AMDGPU: Expand atomicrmw nand in IRMatt Arsenault2018-10-021-0/+2
| | | | llvm-svn: 343559
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-6/+0
| | | | llvm-svn: 341165
* [AMDGPU] Add support for multi-dword s.buffer.load intrinsicTim Renouf2018-08-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch by Marek Olsak and David Stuttard, both of AMD. This adds a new amdgcn intrinsic supporting s.buffer.load, in particular multiple dword variants. These are convenient to use from some front-end implementations. Also modified the existing llvm.SI.load.const intrinsic to common up the underlying implementation. This modification also requires that we can lower to non-uniform loads correctly by splitting larger dword variants into sizes supported by the non-uniform versions of the load. V2: Addressed minor review comments. V3: i1 glc is now i32 cachepolicy for consistency with buffer and tbuffer intrinsics, plus fixed formatting issue. V4: Added glc test. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51098 Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b llvm-svn: 340684
* AMDGPU: Fix not respecting byval alignment in call frame setupMatt Arsenault2018-08-221-1/+0
| | | | | | | | | This was hackily adding in the 4-bytes reserved for the callee's emergency stack slot. Treat it like a normal stack allocation so we get the correct alignment padding behavior. This fixes an inconsistency between the caller and callee. llvm-svn: 340396
* AMDGPU: Custom lower fexpMatt Arsenault2018-08-161-1/+2
| | | | | | | | This will allow the library to just use __builtin_expf directly without expanding this itself. Note f64 still won't work because there is no exp instruction for it. llvm-svn: 339902
* AMDGPU: Address todo for handling 1/(2 pi)Matt Arsenault2018-08-151-0/+2
| | | | llvm-svn: 339814
* DAG: Enhance isKnownNeverNaNMatt Arsenault2018-08-031-0/+5
| | | | | | | | | | | | Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910
* Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Matt Arsenault2018-07-201-2/+5
| | | | | | Reverts r337079 with fix for msan error. llvm-svn: 337535
* [AMDGPU] [AMDGPU] Support a fdot2 pattern.Farhana Aleen2018-07-161-0/+1
| | | | | | | | | | | | | | | Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
* Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Evgeniy Stepanov2018-07-141-5/+2
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
* AMDGPU: Fix handling of alignment padding in DAG argument loweringMatt Arsenault2018-07-131-2/+5
| | | | | | | | | | | | | | | | | This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-3/+3
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-3/+4
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* AMDGPU: Remove MFI::ABIArgOffsetMatt Arsenault2018-06-281-1/+1
| | | | | | | | | | | | | | We have too many mechanisms for tracking the various offsets used for kernel arguments, so remove one. There's still a lot of confusion with these because there are two different "implicit" argument areas located at the beginning and end of the kernarg segment. Additionally, the offset was determined based on the memory size of the split element types. This would break in a future commit where v3i32 is decomposed into separate i32 pieces. llvm-svn: 335830
* [AMDGPU] Convert rcp to rcp_iflagStanislav Mekhanoshin2018-06-271-0/+2
| | | | | | | | | | | If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
* AMDGPU: Remove old-style image intrinsicsNicolai Haehnle2018-06-211-84/+0
| | | | | | | | | | | | | | | | | | | | Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231
* AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLoweringTom Stellard2018-06-131-2/+0
| | | | | | | | | | | | | | | | | | Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607
* [AMDGPU] DAG combine to produce V_PERM_B32Stanislav Mekhanoshin2018-06-121-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559
* AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMPTom Stellard2018-05-241-1/+0
| | | | | | | | | | | | | | | | Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153
* AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLoweringTom Stellard2018-05-221-1/+0
| | | | | | | | | | | | | | Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016
* AMDGPU: Custom lower v4i16/v4f16 vector operationsMatt Arsenault2018-05-161-0/+4
| | | | | | | | | Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453
* AMDGPU: Add combine for trunc of bitcast from build_vectorMatt Arsenault2018-05-091-0/+1
| | | | | | | | | | | | If the truncate is only accessing the first element of the vector, we can use the original source value. This helps with some combine ordering issues after operations are lowered to integer operations between bitcasts of build_vector. In particular it stops unnecessarily materializing the unused top half of a vector in some cases. llvm-svn: 331909
OpenPOWER on IntegriCloud