summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Enable the scavenger for large framesMatt Arsenault2019-03-271-5/+14
| | | | | | | Another test is needed for the case where the scavenge fail, but there's another issue with that which needs an additional fix. llvm-svn: 357093
* AMDGPU: Add additional MIR tests for exec mask optimizationsMatt Arsenault2019-03-271-3/+11
| | | | | | | | | | Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. llvm-svn: 357091
* AMDGPU: Skip debug_instr when collapsing end_cfMatt Arsenault2019-03-271-3/+8
| | | | | | | Based on how these are inserted, I doubt this was causing a problem in practice. llvm-svn: 357090
* AMDGPU: Fix missing scc implicit def on s_andn2_b64_termMatt Arsenault2019-03-271-18/+13
| | | | | | | Introduce new helper class to copy properties directly from the base instruction. llvm-svn: 357089
* AMDGPU: Don't hardcode num defs for MUBUF instructionsMatt Arsenault2019-03-271-2/+2
| | | | | | | This shouldn't change anything since the no-ret atomics are selected later. llvm-svn: 357084
* AMDGPU: wave_barrier is not isBarrierMatt Arsenault2019-03-271-1/+0
| | | | | | | This is not a control flow instruction, so should not be marked as isBarrier. This fixes a verifier error if followed by unreachable. llvm-svn: 357081
* AMDGPU: Fix areLoadsFromSameBasePtr for DS atomicsMatt Arsenault2019-03-271-4/+11
| | | | | | The offset operand index is different for atomics. llvm-svn: 357073
* Revert of 357063 [AMDGPU][MC] Corrected handling of tied src for atomic ↵Dmitry Preobrazhensky2019-03-271-7/+7
| | | | | | | return MUBUF opcodes Reason: the change was mistakenly committed before review llvm-svn: 357066
* [AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodesDmitry Preobrazhensky2019-03-271-7/+7
| | | | | | | | | | See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59305 llvm-svn: 357063
* Revert "AMDGPU: Scavenge register instead of findUnusedReg"Matt Arsenault2019-03-251-1/+1
| | | | | | | | This reverts r356149. This is crashing on rocBLAS. llvm-svn: 356958
* AMDGPU: Remove unnecessary check for isFullCopyMatt Arsenault2019-03-251-1/+1
| | | | | | | Subregister indexes are not used for physical register operands, so isFullCopy is implied by the physical register check. llvm-svn: 356956
* AMDGPU: Set hasSideEffects 0 on _term instructionsMatt Arsenault2019-03-251-0/+3
| | | | | | | | These were defaulting to true, but they are just wrappers around bit operations. This avoids regressions in the exec mask optimization passes in a future commit. llvm-svn: 356952
* AMDGPU: Add support for cross address space synchronization scopesKonstantin Zhuravlyov2019-03-253-32/+101
| | | | | | Differential Revision: https://reviews.llvm.org/D59517 llvm-svn: 356946
* AMDGPU: Preserve LiveIntervals in WQMMatt Arsenault2019-03-251-0/+2
| | | | | | This seems to already be done, but wasn't marked. llvm-svn: 356922
* [AliasAnalysis] Second prototype to cache BasicAA / anyAA state.Alina Sbirlea2019-03-222-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Adding contained caching to AliasAnalysis. BasicAA is currently the only one using it. AA changes: - This patch is pulling the caches from BasicAAResults to AAResults, meaning the getModRefInfo call benefits from the IsCapturedCache as well when in "batch mode". - All AAResultBase implementations add the QueryInfo member to all APIs. AAResults APIs maintain wrapper APIs such that all alias()/getModRefInfo call sites are unchanged. - AA now provides a BatchAAResults type as a wrapper to AAResults. It keeps the AAResults instance and a QueryInfo instantiated to batch mode. It delegates all work to the AAResults instance with the batched QueryInfo. More API wrappers may be needed in BatchAAResults; only the minimum needed is currently added. MemorySSA changes: - All walkers are now templated on the AA used (AliasAnalysis=AAResults or BatchAAResults). - At build time, we optimize uses; now we create a local walker (lives only as long as OptimizeUses does) using BatchAAResults. - All Walkers have an internal AA and only use that now, never the AA in MemorySSA. The Walkers receive the AA they will use when built. - The walker we use for queries after the build is instantiated on AliasAnalysis and is built after building MemorySSA and setting AA. - All static methods doing walking are now templated on AliasAnalysisType if they are used both during build and after. If used only during build, the method now only takes a BatchAAResults. If used only after build, the method now takes an AliasAnalysis. Subscribers: sanjoy, arsenm, jvesely, nhaehnle, jlebar, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59315 llvm-svn: 356783
* [AMDGPU] Use three- and five-dword result type in image opsTim Renouf2019-03-222-15/+12
| | | | | | | | | | | | | | | | Some image ops return three or five dwords. Previously, we modeled that with a 4 or 8 dword register class. The register allocator could cleverly spot that some subregs were dead and allocate something else there, but that caused the de-optimization that waitcnt insertion would think that the result was used immediately. This commit allows such an image op to have a result with a three or five dword result, avoiding the above de-optimization. Differential Revision: https://reviews.llvm.org/D58905 Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b llvm-svn: 356757
* [AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsicsTim Renouf2019-03-226-24/+68
| | | | | | | | | | | | | | | Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755
* [AMDGPU] Added v5i32 and v5f32 register classesTim Renouf2019-03-2210-4/+144
| | | | | | | | | | They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-2112-52/+279
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* [AMDGPU] Do not generate spurious PAL metadataTim Renouf2019-03-202-6/+10
| | | | | | | | | | | | My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata" accidentally caused a spurious PAL metadata .note record to be emitted for any AMDGPU output. That caused failures in the lld test amdgpu-relocs.s. Fixed. Differential Revision: https://reviews.llvm.org/D59613 Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e llvm-svn: 356621
* [AMDGPU] Fix dependency on `BinaryFormat`Michael Liao2019-03-201-1/+1
| | | | | | | | | | | | Summary: - The linking is broken when this library is built as shared one. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59610 llvm-svn: 356617
* AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselectMatt Arsenault2019-03-201-44/+19
| | | | | | | | | The constantness shouldn't change the register bank choice. We also don't need to restrict this to only indexing VGPRs, since it's possible to index SGPRs (but SelectionDAG made using this difficult). Allow directly indexing SGPRs when appropriate. llvm-svn: 356611
* [AMDGPU] Fix clamp bit DAG operandMichael Liao2019-03-201-5/+8
| | | | | | | | | | | | | | | | | | Summary: - Should use `targetconstant` instead of `constant` operand for clamp bit, which is expected as an immediate operand. Under certain conditions, such as a common `i1 false` constant is used in other place and selected before the instruction with clamp bit, register operand may be added instead of immediate one. Use `targetcosntant` to enforce that. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59608 llvm-svn: 356608
* AMDHSA: Fix COMPUTE_PGM_RSRC2.USER_SGPR calculation when parsing ISA assemblyKonstantin Zhuravlyov2019-03-201-7/+7
| | | | | | | | It must match https://llvm.org/docs/AMDGPUUsage.html#initial-kernel-execution-state Differential Revision: https://reviews.llvm.org/D59570 llvm-svn: 356603
* [AMDGPU] Added MsgPack format PAL metadataTim Renouf2019-03-205-55/+636
| | | | | | | | | | | | | | Summary: PAL metadata now supports both the old linear reg=val pairs format and the new MsgPack format. The MsgPack format uses YAML as its textual representation. On output to YAML, a mnemonic name is provided for some hardware registers. Differential Revision: https://reviews.llvm.org/D57028 Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94 llvm-svn: 356591
* [AMDGPU] Factored PAL metadata handling out into its own classTim Renouf2019-03-208-129/+345
| | | | | | | | | | | | | | | | | | | | | | Summary: This commit introduces a new AMDGPUPALMetadata class that: * is inside the AMDGPU target; * keeps an in-memory representation of PAL metadata; * provides a method to read the frontend-supplied metadata from LLVM IR; * provides methods for the asm printer to set metadata items; * provides methods to write the metadata as a binary blob to put in a .note record or as an asm directive; * provides a method to read the metadata as a binary blob from a .note record. Because llvm-readobj cannot call directly into a target, I had to remove llvm-readobj's ability to dump PAL metadata, pending a resolution to https://reviews.llvm.org/D52821 Differential Revision: https://reviews.llvm.org/D57027 Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64 llvm-svn: 356582
* [AMDGPU][MC] Corrected checks for DS offset0 rangeDmitry Preobrazhensky2019-03-201-1/+1
| | | | | | | | | | See bug 40889: https://bugs.llvm.org/show_bug.cgi?id=40889 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59313 llvm-svn: 356576
* [AMDGPU][MC][GFX9] Added support of operands shared_base, shared_limit, ↵Dmitry Preobrazhensky2019-03-206-8/+76
| | | | | | | | | | | | private_base, private_limit, pops_exiting_wave_id See bug 39297: https://bugs.llvm.org/show_bug.cgi?id=39297 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D59290 llvm-svn: 356561
* [AMDGPU] Allow MIMG with no uses in adjustWritemask in iselDavid Stuttard2019-03-201-0/+4
| | | | | | | | | | | | | | | | | | | Summary: If an MIMG instruction has managed to get through to adjustWritemask in isel but has no uses (and doesn't enable TFC) then prevent an assertion by not attempting to adjust the writemask. The instruction will be removed anyway. Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58964 llvm-svn: 356540
* CodeGen: Refactor regallocator command line and target selectionMatt Arsenault2019-03-191-6/+6
| | | | | | | | | | This will allow targets more flexibility to replace the register allocator core passes. In a future commit, AMDGPU will run the core register assignment passes twice, and will also want to disallow using the standard -regalloc option. llvm-svn: 356506
* [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsicsRyan Taylor2019-03-196-1/+170
| | | | | | | | | | | | | | | Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465
* [AMDGPU] Ban i8 min3 promotion.Neil Henning2019-03-191-3/+3
| | | | | | | | | | | | | | | | I found this really weird WWM-related case whereby through the WWM transformations our isel lowering was trying to promote 2 min's into a min3 for the i8 type, which our hardware doesn't support. The new min3_i8.ll test case would previously spew the error: PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68 Before the simple fix to our isel lowering to not do it for i8 MVT's. Differential Revision: https://reviews.llvm.org/D59543 llvm-svn: 356464
* [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`.Michael Liao2019-03-182-2/+10
| | | | | | | | | | Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59501 llvm-svn: 356405
* [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmeticTim Renouf2019-03-186-33/+69
| | | | | | | | | | | | | | Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399
* [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiersTim Renouf2019-03-189-46/+127
| | | | | | | | | | | | | | | | | This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398
* [MsgPack][AMDGPU] Fix unflushed raw_string_ostream bugs on windows expensive ↵Tim Renouf2019-03-181-2/+4
| | | | | | | | | | | | | checks bot This fixes a couple of unflushed raw_string_ostream bugs in recent commits that only show up on a bot building on windows with expensive checks. Differential Revision: https://reviews.llvm.org/D59396 Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf llvm-svn: 356394
* [TargetLowering] Add code size information on isFPImmLegal. NFCAdhemerval Zanella2019-03-182-2/+4
| | | | | | | | | | | This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389
* [AMDGPU] Add an experimental buffer fat pointer address space.Neil Henning2019-03-185-20/+26
| | | | | | | | | | | | Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 llvm-svn: 356373
* AMDGPU: Partially fix default device for HSAMatt Arsenault2019-03-173-3/+8
| | | | | | | | | | | | | | | | | | There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347
* [AMDGPU] Prepare for introduction of v3 and v5 MVTsTim Renouf2019-03-171-3/+4
| | | | | | | | | | | | | | | | | | | AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342
* AMDGPU: Fix a SIAnnotateControlFlow issue when there are multiple backedges.Changpeng Fang2019-03-151-2/+11
| | | | | | | | | | | | | | | | | | | | | | Summary: At the exit of the loop, the compiler uses a register to remember and accumulate the number of threads that have already exited. When all active threads exit the loop, this register is used to restore the exec mask, and the execution continues for the post loop code. When there is a "continue" in the loop, the compiler made a mistake to reset the register to 0 when the "continue" backedge is taken. This will result in some threads not executing the post loop code as they are supposed to. This patch fixed the issue. Reviewers: nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D59312 llvm-svn: 356298
* [AMDGPU] Fix SGPR fixing through SCC chainingMichael Liao2019-03-152-12/+18
| | | | | | | | | | | | | | | | | | Summary: - During the fixing of SGPR copying from VGPR, ensure users of SCC is properly propagated, i.e. * only propagate through live def of SCC, * skip the SCC-def inst itself, and * stop the propagation on the other SCC-def inst after checking its SCC-use first. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59362 llvm-svn: 356258
* MIR: Allow targets to serialize MachineFunctionInfoMatt Arsenault2019-03-146-7/+188
| | | | | | | | | | | | | | | | | | This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. llvm-svn: 356215
* AMDGPU: Correct type for waitcnt debug flagMatt Arsenault2019-03-141-2/+2
| | | | llvm-svn: 356206
* AMDGPU: Scavenge register instead of findUnusedRegMatt Arsenault2019-03-141-1/+1
| | | | llvm-svn: 356149
* AMDGPU: Don't add unnecessary convergent attributesMatt Arsenault2019-03-141-19/+2
| | | | | | These are redundant with the intrinsic declaration. llvm-svn: 356143
* [AMDGPU] Silence gcc 7 warningsStanislav Mekhanoshin2019-03-135-38/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D59330 llvm-svn: 356100
* [AMDGPU] Switched HSA metadata to use MsgPackDocumentTim Renouf2019-03-134-132/+116
| | | | | | | | | | | | Summary: MsgPackDocument is the lighter-weight replacement for MsgPackTypes. This commit switches AMDGPU HSA metadata processing to use MsgPackDocument instead of MsgPackTypes. Differential Revision: https://reviews.llvm.org/D57024 Change-Id: I0751668013abe8c87db01db1170831a76079b3a6 llvm-svn: 356081
* IR: Add immarg attributeMatt Arsenault2019-03-121-27/+11
| | | | | | | | | | | | | | | | | This indicates an intrinsic parameter is required to be a constant, and should not be replaced with a non-constant value. Add the attribute to all AMDGPU and generic intrinsics that comments indicate it should apply to. I scanned other target intrinsics, but I don't see any obvious comments indicating which arguments are intended to be only immediates. This breaks one questionable testcase for the autoupgrade. I'm unclear on whether the autoupgrade is supposed to really handle declarations which were never valid. The verifier fails because the attributes now refer to a parameter past the end of the argument list. llvm-svn: 355981
* [AMDGPU] Add support for immediate operand for S_ENDPGMDavid Stuttard2019-03-129-10/+81
| | | | | | | | | | | | | | | | | Summary: Add support for immediate operand in S_ENDPGM Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6 Reviewers: alexshap Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59213 llvm-svn: 355902
OpenPOWER on IntegriCloud