summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsicsRyan Taylor2019-03-191-0/+22
| | | | | | | | | | | | | | | Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465
* [TargetLowering] Add code size information on isFPImmLegal. NFCAdhemerval Zanella2019-03-181-1/+2
| | | | | | | | | | | This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389
* [AMDGPU] Prepare for introduction of v3 and v5 MVTsTim Renouf2019-03-171-3/+4
| | | | | | | | | | | | | | | | | | | AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-081-0/+6
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* AMDGPU: Fix crashes in invalid call casesMatt Arsenault2019-02-281-4/+3
| | | | | | | We have to at least tolerate calls to kernels, possibly with a mismatched calling convention on the callsite. llvm-svn: 355049
* Implementation of asm-goto support in LLVMCraig Topper2019-02-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563
* AMDGPU: Fix assert on trunc from bitcast of build_vectorMatt Arsenault2019-02-051-1/+1
| | | | | | | | | The v2i64 argument is lowered to a bitcast of v4i32 build_vector. This would then attempt to use the i32-element as the source of the vector truncate. This really would need to collect 2 elements from the build_vector to produce the intended truncate. llvm-svn: 353202
* [AMDGPU] Add intrinsics for 16 bit interpolationTim Corringham2019-01-281-0/+3
| | | | | | | | | | | | | | | | | | | Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357
* Codegen support for atomicrmw fadd/fsubMatt Arsenault2019-01-221-3/+7
| | | | llvm-svn: 351851
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Remove llvm.SI.load.constMatt Arsenault2019-01-181-1/+0
| | | | | | | It's taken 3 years, but now all of the old AMDGPU and SI intrinsics are finally gone llvm-svn: 351586
* AMDGPU: Add llvm.amdgcn.ds.ordered.add & swapMarek Olsak2019-01-161-0/+1
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351
* [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes ↵Craig Topper2019-01-071-30/+52
| | | | | | | | | | | | | | a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560
* [AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-211-14/+7
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349912
* [x86] allow vector load narrowing with multi-use valuesSanjay Patel2018-11-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
* AMDGPU: Fix assertion with bitcast from i64 constant to v4i16Matt Arsenault2018-11-021-3/+4
| | | | llvm-svn: 345922
* Check shouldReduceLoadWidth from SimplifySetCCStanislav Mekhanoshin2018-10-311-0/+12
| | | | | | | | | | | | SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-0/+8
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* AMDGPU: Expand atomicrmw nand in IRMatt Arsenault2018-10-021-0/+7
| | | | llvm-svn: 343559
* AMDGPU: Expand vector canonicalizesMatt Arsenault2018-09-181-0/+1
| | | | llvm-svn: 342439
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-4/+3
| | | | llvm-svn: 341165
* [AMDGPU] Add support for multi-dword s.buffer.load intrinsicTim Renouf2018-08-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch by Marek Olsak and David Stuttard, both of AMD. This adds a new amdgcn intrinsic supporting s.buffer.load, in particular multiple dword variants. These are convenient to use from some front-end implementations. Also modified the existing llvm.SI.load.const intrinsic to common up the underlying implementation. This modification also requires that we can lower to non-uniform loads correctly by splitting larger dword variants into sizes supported by the non-uniform versions of the load. V2: Addressed minor review comments. V3: i1 glc is now i32 cachepolicy for consistency with buffer and tbuffer intrinsics, plus fixed formatting issue. V4: Added glc test. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51098 Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b llvm-svn: 340684
* AMDGPU: Fix not respecting byval alignment in call frame setupMatt Arsenault2018-08-221-2/+1
| | | | | | | | | This was hackily adding in the 4-bytes reserved for the callee's emergency stack slot. Treat it like a normal stack allocation so we get the correct alignment padding behavior. This fixes an inconsistency between the caller and callee. llvm-svn: 340396
* AMDGPU: Custom lower fexpMatt Arsenault2018-08-161-0/+32
| | | | | | | | This will allow the library to just use __builtin_expf directly without expanding this itself. Note f64 still won't work because there is no exp instruction for it. llvm-svn: 339902
* AMDGPU: Fold fneg into fmed3Matt Arsenault2018-08-151-0/+11
| | | | llvm-svn: 339821
* AMDGPU: Address todo for handling 1/(2 pi)Matt Arsenault2018-08-151-6/+23
| | | | llvm-svn: 339814
* AMDGPU: More canonicalized operationsMatt Arsenault2018-08-101-1/+11
| | | | llvm-svn: 339464
* AMDGPU: cvt_pk_rtz_f16 canonicalizesMatt Arsenault2018-08-061-1/+2
| | | | llvm-svn: 339078
* AMDGPU: Treat more custom operations as canonicalizingMatt Arsenault2018-08-061-1/+4
| | | | | | | | | | | | | | Everything should quiet, and I think everything should flush. I assume the min3/med3/max3 follow the same rules as regular min/max for flushing, which should at least be conservatively correct. There are still more operations that need to be handled. llvm-svn: 339065
* DAG: Enhance isKnownNeverNaNMatt Arsenault2018-08-031-0/+83
| | | | | | | | | | | | Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910
* AMDGPU: Make fneg combine handle fcanonicalizeMatt Arsenault2018-07-301-0/+2
| | | | llvm-svn: 338243
* DAG: Add calling convention argument to calling convention funcsMatt Arsenault2018-07-281-4/+3
| | | | | | | | This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194
* AMDGPU: Stop trying to extend arguments for cloverMatt Arsenault2018-07-281-5/+1
| | | | | | | This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193
* Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Matt Arsenault2018-07-201-75/+108
| | | | | | Reverts r337079 with fix for msan error. llvm-svn: 337535
* [AMDGPU] [AMDGPU] Support a fdot2 pattern.Farhana Aleen2018-07-161-0/+1
| | | | | | | | | | | | | | | Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
* Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Evgeniy Stepanov2018-07-141-108/+75
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
* AMDGPU: Fix handling of alignment padding in DAG argument loweringMatt Arsenault2018-07-131-75/+108
| | | | | | | | | | | | | | | | | This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
* AMDGPU: Fix assert in truncate combine with vectorsMatt Arsenault2018-07-121-1/+1
| | | | | | | The piece above probably has the same problem, but I need to try to come up with a test for it. llvm-svn: 336935
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-5/+5
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Fix UBSan error caused by r335942Tom Stellard2018-07-061-1/+2
| | | | | | | | | | | | | | Summary: Fixes PR38071. Reviewers: arsenm, dstenb Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48979 llvm-svn: 336448
* AMDGPU/GlobalISel: Implement custom kernel arg loweringMatt Arsenault2018-07-051-2/+2
| | | | | | | | | | | | | Avoid using allocateKernArg / AssignFn. We do not want any of the type splitting properties of normal calling convention lowering. For now at least this exists alongside the IR argument lowering pass. This is necessary to handle struct padding correctly while some arguments are still skipped by the IR argument lowering pass. llvm-svn: 336373
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-52/+5
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* AMDGPU: Remove MFI::ABIArgOffsetMatt Arsenault2018-06-281-3/+7
| | | | | | | | | | | | | | We have too many mechanisms for tracking the various offsets used for kernel arguments, so remove one. There's still a lot of confusion with these because there are two different "implicit" argument areas located at the beginning and end of the kernarg segment. Additionally, the offset was determined based on the memory size of the split element types. This would break in a future commit where v3i32 is decomposed into separate i32 pieces. llvm-svn: 335830
* [AMDGPU] Convert rcp to rcp_iflagStanislav Mekhanoshin2018-06-271-10/+18
| | | | | | | | | | | If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
* AMDGPU: Remove old-style image intrinsicsNicolai Haehnle2018-06-211-76/+0
| | | | | | | | | | | | | | | | | | | | Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231
* AMDGPU: Make v4i16/v4f16 legalMatt Arsenault2018-06-151-2/+16
| | | | | | | Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
* [AMDGPU] Corrected computeKnownBits for V_PERM_B32Stanislav Mekhanoshin2018-06-131-7/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D48133 llvm-svn: 334640
* AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLoweringTom Stellard2018-06-131-69/+0
| | | | | | | | | | | | | | | | | | Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607
* [AMDGPU] DAG combine to produce V_PERM_B32Stanislav Mekhanoshin2018-06-121-0/+29
| | | | | | Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559
* AMDGPU: Error on LDS global address in functionsMatt Arsenault2018-06-081-1/+9
| | | | | | | These won't work as expected now, so error on them to avoid wasting time debugging this in the future. llvm-svn: 334269
OpenPOWER on IntegriCloud