summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Select flat loadsMatt Arsenault2019-07-161-12/+5
| | | | | | | | Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237
* [AMDGPU] use v32f32 for 3 mfma intrinsicsStanislav Mekhanoshin2019-07-121-0/+9
| | | | | | | | | These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972
* [AMDGPU] gfx908 mfma supportStanislav Mekhanoshin2019-07-111-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
* Remove some redundant code from r290372 and improve a comment.Jay Foad2019-07-111-5/+3
| | | | llvm-svn: 365741
* [AMDGPU] gfx908 atomic fadd and atomic pk_faddStanislav Mekhanoshin2019-07-111-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717
* [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into ↵Craig Topper2019-07-091-4/+9
| | | | | | | | | | | | | | | | isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549
* [AMDGPU] Packed thread ids in function call ABIStanislav Mekhanoshin2019-06-281-3/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D63851 llvm-svn: 364619
* AMDGPU: Write LDS objects out as global symbols in code generationNicolai Haehnle2019-06-251-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297
* [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests ↵Simon Pilgrim2019-06-121-2/+4
| | | | | | | | | | | | | | (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179
* AtomicExpand: Don't crash on non-0 allocaMatt Arsenault2019-06-111-0/+1
| | | | | | | This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022
* AMDGPU: Expand < 32-bit atomicsMatt Arsenault2019-06-111-0/+2
| | | | | | Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021
* [AMDGPU] Increases available SGPR for Calling ConventionRyan Taylor2019-05-151-2/+2
| | | | | | | | | | | | | | | | Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105. Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61261 llvm-svn: 360778
* [AMDGPU] Reapplied BFE canonicalization from D60462Simon Pilgrim2019-05-081-11/+25
| | | | | | This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch. llvm-svn: 360263
* Revert r359392 and r358887Craig Topper2019-05-061-25/+11
| | | | | | | | | | | | | | | | | | | | Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-301-0/+55
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handlingSimon Pilgrim2019-04-221-11/+25
| | | | | | | | | | | | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887
* [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))Tim Renouf2019-04-181-0/+10
| | | | | | | | | | | fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but creating the new fadd folds to just fneg(A). When A has multiple uses, this confuses it and you get an assert. Fixed. Differential Revision: https://reviews.llvm.org/D60633 Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c llvm-svn: 358640
* [AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsicsTim Renouf2019-03-221-1/+0
| | | | | | | | | | | | | | | Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755
* [AMDGPU] Added v5i32 and v5f32 register classesTim Renouf2019-03-221-2/+15
| | | | | | | | | | They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-211-13/+98
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsicsRyan Taylor2019-03-191-0/+22
| | | | | | | | | | | | | | | Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465
* [TargetLowering] Add code size information on isFPImmLegal. NFCAdhemerval Zanella2019-03-181-1/+2
| | | | | | | | | | | This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389
* [AMDGPU] Prepare for introduction of v3 and v5 MVTsTim Renouf2019-03-171-3/+4
| | | | | | | | | | | | | | | | | | | AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-081-0/+6
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* AMDGPU: Fix crashes in invalid call casesMatt Arsenault2019-02-281-4/+3
| | | | | | | We have to at least tolerate calls to kernels, possibly with a mismatched calling convention on the callsite. llvm-svn: 355049
* Implementation of asm-goto support in LLVMCraig Topper2019-02-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563
* AMDGPU: Fix assert on trunc from bitcast of build_vectorMatt Arsenault2019-02-051-1/+1
| | | | | | | | | The v2i64 argument is lowered to a bitcast of v4i32 build_vector. This would then attempt to use the i32-element as the source of the vector truncate. This really would need to collect 2 elements from the build_vector to produce the intended truncate. llvm-svn: 353202
* [AMDGPU] Add intrinsics for 16 bit interpolationTim Corringham2019-01-281-0/+3
| | | | | | | | | | | | | | | | | | | Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357
* Codegen support for atomicrmw fadd/fsubMatt Arsenault2019-01-221-3/+7
| | | | llvm-svn: 351851
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Remove llvm.SI.load.constMatt Arsenault2019-01-181-1/+0
| | | | | | | It's taken 3 years, but now all of the old AMDGPU and SI intrinsics are finally gone llvm-svn: 351586
* AMDGPU: Add llvm.amdgcn.ds.ordered.add & swapMarek Olsak2019-01-161-0/+1
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351
* [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes ↵Craig Topper2019-01-071-30/+52
| | | | | | | | | | | | | | a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560
* [AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-211-14/+7
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349912
* [x86] allow vector load narrowing with multi-use valuesSanjay Patel2018-11-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
* AMDGPU: Fix assertion with bitcast from i64 constant to v4i16Matt Arsenault2018-11-021-3/+4
| | | | llvm-svn: 345922
* Check shouldReduceLoadWidth from SimplifySetCCStanislav Mekhanoshin2018-10-311-0/+12
| | | | | | | | | | | | SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-0/+8
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* AMDGPU: Expand atomicrmw nand in IRMatt Arsenault2018-10-021-0/+7
| | | | llvm-svn: 343559
* AMDGPU: Expand vector canonicalizesMatt Arsenault2018-09-181-0/+1
| | | | llvm-svn: 342439
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-4/+3
| | | | llvm-svn: 341165
* [AMDGPU] Add support for multi-dword s.buffer.load intrinsicTim Renouf2018-08-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch by Marek Olsak and David Stuttard, both of AMD. This adds a new amdgcn intrinsic supporting s.buffer.load, in particular multiple dword variants. These are convenient to use from some front-end implementations. Also modified the existing llvm.SI.load.const intrinsic to common up the underlying implementation. This modification also requires that we can lower to non-uniform loads correctly by splitting larger dword variants into sizes supported by the non-uniform versions of the load. V2: Addressed minor review comments. V3: i1 glc is now i32 cachepolicy for consistency with buffer and tbuffer intrinsics, plus fixed formatting issue. V4: Added glc test. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51098 Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b llvm-svn: 340684
* AMDGPU: Fix not respecting byval alignment in call frame setupMatt Arsenault2018-08-221-2/+1
| | | | | | | | | This was hackily adding in the 4-bytes reserved for the callee's emergency stack slot. Treat it like a normal stack allocation so we get the correct alignment padding behavior. This fixes an inconsistency between the caller and callee. llvm-svn: 340396
* AMDGPU: Custom lower fexpMatt Arsenault2018-08-161-0/+32
| | | | | | | | This will allow the library to just use __builtin_expf directly without expanding this itself. Note f64 still won't work because there is no exp instruction for it. llvm-svn: 339902
* AMDGPU: Fold fneg into fmed3Matt Arsenault2018-08-151-0/+11
| | | | llvm-svn: 339821
* AMDGPU: Address todo for handling 1/(2 pi)Matt Arsenault2018-08-151-6/+23
| | | | llvm-svn: 339814
* AMDGPU: More canonicalized operationsMatt Arsenault2018-08-101-1/+11
| | | | llvm-svn: 339464
* AMDGPU: cvt_pk_rtz_f16 canonicalizesMatt Arsenault2018-08-061-1/+2
| | | | llvm-svn: 339078
* AMDGPU: Treat more custom operations as canonicalizingMatt Arsenault2018-08-061-1/+4
| | | | | | | | | | | | | | Everything should quiet, and I think everything should flush. I assume the min3/med3/max3 follow the same rules as regular min/max for flushing, which should at least be conservatively correct. There are still more operations that need to be handled. llvm-svn: 339065
* DAG: Enhance isKnownNeverNaNMatt Arsenault2018-08-031-0/+83
| | | | | | | | | | | | Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910
OpenPOWER on IntegriCloud