summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Fix not using v_cvt_f16_[iu]16Matt Arsenault2020-01-071-8/+31
| | | | | We weren't treating i16->f16 casts as legal on targets with these instructions, and always using a pair of casts through i32.
* [TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues ↵Craig Topper2019-12-301-6/+11
| | | | | | | | | | | instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.
* AMDGPU: Improve llvm.round.f64 lowering for CI+Matt Arsenault2019-12-301-3/+4
| | | | | The path already used for f16/f32 works a lot better when v_trunc_f64 is available.
* Fix whitespace.Jay Foad2019-12-161-2/+2
|
* Fix for AMDGPU MUL_I24 known bits calculationJay Foad2019-12-161-9/+8
| | | | | | | | | | | | | | | | | | | Summary: At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number". In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8. Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Patch by Eugene Kuznetsov. Differential Revision: https://reviews.llvm.org/D70367
* [NFC] Use EVT instead of bool for getSetCCInverse()Alex Richardson2019-12-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void *)0x12033091e < (void *)0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917
* AMDGPU: Refactor treatment of denormal modeMatt Arsenault2019-11-191-2/+8
| | | | | | | | | | | Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
* AMDGPU: Change boolean content type to 0 or 1Matt Arsenault2019-11-151-3/+0
| | | | | | | | The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to some degree is the choice of lower resistance since that's what most targets use. This enables a few combines that don't bother to support ZeroOrNegativeOneBooleanContent.
* AMDGPU: Select global atomicrmw faddMatt Arsenault2019-11-061-1/+0
| | | | This only works if there is no use of the return value.
* [amdgpu] Fix known bits compuation on `MUL_I24`/`MUL_U24`.Michael Liao2019-11-011-0/+3
| | | | | | | | | | Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D69735
* AMDGPU: Select basic interp directly from intrinsicsMatt Arsenault2019-10-211-5/+13
| | | | llvm-svn: 375457
* [Alignment] Migrate Attribute::getWith(Stack)AlignmentGuillaume Chatelet2019-10-151-1/+1
| | | | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Reviewed By: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68792 llvm-svn: 374884
* AMDGPU: Move SelectFlatOffset back into AMDGPUISelDAGToDAGMatt Arsenault2019-10-111-48/+0
| | | | llvm-svn: 374495
* [AMDGPU] Use math constants defined in MathExtras (NFC)Evandro Menezes2019-10-091-24/+4
| | | | | | | | Use the the new math constants in `MathExtras.h`. Differential revision: https://reviews.llvm.org/D68285 llvm-svn: 374208
* [TargetLowering] Make allowsMemoryAccess methode virtual.Thomas Raoux2019-09-261-2/+3
| | | | | | | | | | | Rename old function to explicitly show that it cares only about alignment. The new allowsMemoryAccess call the function related to alignment by default and can be overridden by target to inform whether the memory access is legal or not. Differential Revision: https://reviews.llvm.org/D67121 llvm-svn: 372935
* [AMDGPU] isSDNodeAlwaysUniform - silence static analyzer ↵Simon Pilgrim2019-09-221-3/+2
| | | | | | | | dyn_cast<LoadSDNode> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<LoadSDNode> directly and if not assert will fire for us. llvm-svn: 372528
* [SVE][MVT] Fixed-length vector MVT rangesGraham Hunter2019-09-171-1/+1
| | | | | | | | | | | | | | | | | * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099
* AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUEMatt Arsenault2019-09-091-1/+1
| | | | | | Handle the simple case that lowers to a constant. llvm-svn: 371424
* AMDGPU: Remove pointless wrapper nodes for init.exec intrinsicsMatt Arsenault2019-09-091-2/+0
| | | | llvm-svn: 371364
* AMDGPU: Fix emitting multiple stack loads for stack passed workitemsMatt Arsenault2019-09-051-1/+15
| | | | | | | | | | The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148
* AMDGPU: Remove unused custom node definitionMatt Arsenault2019-09-011-2/+0
| | | | llvm-svn: 370603
* AMDGPU: Combine directly on mul24 intrinsicsMatt Arsenault2019-08-271-3/+27
| | | | | | | | | The problem these are supposed to work around can occur before the intrinsics are lowered into the nodes. Try to directly simplify them so they are matched before the bit assert operations can be optimized out. llvm-svn: 369994
* [MVT] Add v16f16 and v32f16 vectors.Craig Topper2019-08-211-0/+4
| | | | | | | | | I might look at improving PR43065 which will require being able to mark a 256 and 512 bit vector of f16 as Legal. Differential Revision: https://reviews.llvm.org/D66515 llvm-svn: 369565
* MVT: Add v3i16/v3f16 vectorsMatt Arsenault2019-08-151-0/+5
| | | | | | | | | | | | AMDGPU has some buffer intrinsics which theoretically could use this. Some of the generated tables include the 3 and 4 element vector versions of these rounded to 64-bits, which is ambiguous. Add these to help the table disambiguate these. Assertion change is for the path odd sized vectors now take for R600. v3i16 is widened to v4i16, which then needs to be promoted to v4i32. llvm-svn: 369038
* Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-061-0/+1
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367969
* Revert "[AMDGPU] Use S_DENORM_MODE for gfx10"Dmitri Gribenko2019-08-051-1/+0
| | | | | | | This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. llvm-svn: 367904
* [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-051-0/+1
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367882
* AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}Nicolai Haehnle2019-08-051-0/+2
| | | | | | | | | | | | | | | | | Summary: Wrapping increment/decrement. These aren't exposed by many APIs... Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c Reviewers: arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65283 llvm-svn: 367821
* Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI.Simon Pilgrim2019-07-231-2/+2
| | | | llvm-svn: 366808
* AMDGPU: Decompose all values to 32-bit pieces for calling conventionsMatt Arsenault2019-07-191-74/+0
| | | | | | | | | | This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578
* AMDGPU/GlobalISel: Select flat loadsMatt Arsenault2019-07-161-12/+5
| | | | | | | | Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237
* [AMDGPU] use v32f32 for 3 mfma intrinsicsStanislav Mekhanoshin2019-07-121-0/+9
| | | | | | | | | These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972
* [AMDGPU] gfx908 mfma supportStanislav Mekhanoshin2019-07-111-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
* Remove some redundant code from r290372 and improve a comment.Jay Foad2019-07-111-5/+3
| | | | llvm-svn: 365741
* [AMDGPU] gfx908 atomic fadd and atomic pk_faddStanislav Mekhanoshin2019-07-111-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717
* [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into ↵Craig Topper2019-07-091-4/+9
| | | | | | | | | | | | | | | | isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549
* [AMDGPU] Packed thread ids in function call ABIStanislav Mekhanoshin2019-06-281-3/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D63851 llvm-svn: 364619
* AMDGPU: Write LDS objects out as global symbols in code generationNicolai Haehnle2019-06-251-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297
* [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests ↵Simon Pilgrim2019-06-121-2/+4
| | | | | | | | | | | | | | (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179
* AtomicExpand: Don't crash on non-0 allocaMatt Arsenault2019-06-111-0/+1
| | | | | | | This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022
* AMDGPU: Expand < 32-bit atomicsMatt Arsenault2019-06-111-0/+2
| | | | | | Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021
* [AMDGPU] Increases available SGPR for Calling ConventionRyan Taylor2019-05-151-2/+2
| | | | | | | | | | | | | | | | Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105. Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61261 llvm-svn: 360778
* [AMDGPU] Reapplied BFE canonicalization from D60462Simon Pilgrim2019-05-081-11/+25
| | | | | | This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch. llvm-svn: 360263
* Revert r359392 and r358887Craig Topper2019-05-061-25/+11
| | | | | | | | | | | | | | | | | | | | Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-301-0/+55
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handlingSimon Pilgrim2019-04-221-11/+25
| | | | | | | | | | | | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887
* [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))Tim Renouf2019-04-181-0/+10
| | | | | | | | | | | fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but creating the new fadd folds to just fneg(A). When A has multiple uses, this confuses it and you get an assert. Fixed. Differential Revision: https://reviews.llvm.org/D60633 Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c llvm-svn: 358640
* [AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsicsTim Renouf2019-03-221-1/+0
| | | | | | | | | | | | | | | Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755
* [AMDGPU] Added v5i32 and v5f32 register classesTim Renouf2019-03-221-2/+15
| | | | | | | | | | They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-211-13/+98
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
OpenPOWER on IntegriCloud