summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Do not generate spurious PAL metadataTim Renouf2019-03-202-6/+10
| | | | | | | | | | | | My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata" accidentally caused a spurious PAL metadata .note record to be emitted for any AMDGPU output. That caused failures in the lld test amdgpu-relocs.s. Fixed. Differential Revision: https://reviews.llvm.org/D59613 Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e llvm-svn: 356621
* [AMDGPU] Fix dependency on `BinaryFormat`Michael Liao2019-03-201-1/+1
| | | | | | | | | | | | Summary: - The linking is broken when this library is built as shared one. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59610 llvm-svn: 356617
* AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselectMatt Arsenault2019-03-201-44/+19
| | | | | | | | | The constantness shouldn't change the register bank choice. We also don't need to restrict this to only indexing VGPRs, since it's possible to index SGPRs (but SelectionDAG made using this difficult). Allow directly indexing SGPRs when appropriate. llvm-svn: 356611
* [AMDGPU] Fix clamp bit DAG operandMichael Liao2019-03-201-5/+8
| | | | | | | | | | | | | | | | | | Summary: - Should use `targetconstant` instead of `constant` operand for clamp bit, which is expected as an immediate operand. Under certain conditions, such as a common `i1 false` constant is used in other place and selected before the instruction with clamp bit, register operand may be added instead of immediate one. Use `targetcosntant` to enforce that. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59608 llvm-svn: 356608
* AMDHSA: Fix COMPUTE_PGM_RSRC2.USER_SGPR calculation when parsing ISA assemblyKonstantin Zhuravlyov2019-03-201-7/+7
| | | | | | | | It must match https://llvm.org/docs/AMDGPUUsage.html#initial-kernel-execution-state Differential Revision: https://reviews.llvm.org/D59570 llvm-svn: 356603
* [AMDGPU] Added MsgPack format PAL metadataTim Renouf2019-03-205-55/+636
| | | | | | | | | | | | | | Summary: PAL metadata now supports both the old linear reg=val pairs format and the new MsgPack format. The MsgPack format uses YAML as its textual representation. On output to YAML, a mnemonic name is provided for some hardware registers. Differential Revision: https://reviews.llvm.org/D57028 Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94 llvm-svn: 356591
* [AMDGPU] Factored PAL metadata handling out into its own classTim Renouf2019-03-208-129/+345
| | | | | | | | | | | | | | | | | | | | | | Summary: This commit introduces a new AMDGPUPALMetadata class that: * is inside the AMDGPU target; * keeps an in-memory representation of PAL metadata; * provides a method to read the frontend-supplied metadata from LLVM IR; * provides methods for the asm printer to set metadata items; * provides methods to write the metadata as a binary blob to put in a .note record or as an asm directive; * provides a method to read the metadata as a binary blob from a .note record. Because llvm-readobj cannot call directly into a target, I had to remove llvm-readobj's ability to dump PAL metadata, pending a resolution to https://reviews.llvm.org/D52821 Differential Revision: https://reviews.llvm.org/D57027 Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64 llvm-svn: 356582
* [AMDGPU][MC] Corrected checks for DS offset0 rangeDmitry Preobrazhensky2019-03-201-1/+1
| | | | | | | | | | See bug 40889: https://bugs.llvm.org/show_bug.cgi?id=40889 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59313 llvm-svn: 356576
* [AMDGPU][MC][GFX9] Added support of operands shared_base, shared_limit, ↵Dmitry Preobrazhensky2019-03-206-8/+76
| | | | | | | | | | | | private_base, private_limit, pops_exiting_wave_id See bug 39297: https://bugs.llvm.org/show_bug.cgi?id=39297 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D59290 llvm-svn: 356561
* [AMDGPU] Allow MIMG with no uses in adjustWritemask in iselDavid Stuttard2019-03-201-0/+4
| | | | | | | | | | | | | | | | | | | Summary: If an MIMG instruction has managed to get through to adjustWritemask in isel but has no uses (and doesn't enable TFC) then prevent an assertion by not attempting to adjust the writemask. The instruction will be removed anyway. Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58964 llvm-svn: 356540
* CodeGen: Refactor regallocator command line and target selectionMatt Arsenault2019-03-191-6/+6
| | | | | | | | | | This will allow targets more flexibility to replace the register allocator core passes. In a future commit, AMDGPU will run the core register assignment passes twice, and will also want to disallow using the standard -regalloc option. llvm-svn: 356506
* [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsicsRyan Taylor2019-03-196-1/+170
| | | | | | | | | | | | | | | Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465
* [AMDGPU] Ban i8 min3 promotion.Neil Henning2019-03-191-3/+3
| | | | | | | | | | | | | | | | I found this really weird WWM-related case whereby through the WWM transformations our isel lowering was trying to promote 2 min's into a min3 for the i8 type, which our hardware doesn't support. The new min3_i8.ll test case would previously spew the error: PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68 Before the simple fix to our isel lowering to not do it for i8 MVT's. Differential Revision: https://reviews.llvm.org/D59543 llvm-svn: 356464
* [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`.Michael Liao2019-03-182-2/+10
| | | | | | | | | | Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59501 llvm-svn: 356405
* [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmeticTim Renouf2019-03-186-33/+69
| | | | | | | | | | | | | | Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399
* [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiersTim Renouf2019-03-189-46/+127
| | | | | | | | | | | | | | | | | This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398
* [MsgPack][AMDGPU] Fix unflushed raw_string_ostream bugs on windows expensive ↵Tim Renouf2019-03-181-2/+4
| | | | | | | | | | | | | checks bot This fixes a couple of unflushed raw_string_ostream bugs in recent commits that only show up on a bot building on windows with expensive checks. Differential Revision: https://reviews.llvm.org/D59396 Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf llvm-svn: 356394
* [TargetLowering] Add code size information on isFPImmLegal. NFCAdhemerval Zanella2019-03-182-2/+4
| | | | | | | | | | | This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389
* [AMDGPU] Add an experimental buffer fat pointer address space.Neil Henning2019-03-185-20/+26
| | | | | | | | | | | | Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 llvm-svn: 356373
* AMDGPU: Partially fix default device for HSAMatt Arsenault2019-03-173-3/+8
| | | | | | | | | | | | | | | | | | There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347
* [AMDGPU] Prepare for introduction of v3 and v5 MVTsTim Renouf2019-03-171-3/+4
| | | | | | | | | | | | | | | | | | | AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342
* AMDGPU: Fix a SIAnnotateControlFlow issue when there are multiple backedges.Changpeng Fang2019-03-151-2/+11
| | | | | | | | | | | | | | | | | | | | | | Summary: At the exit of the loop, the compiler uses a register to remember and accumulate the number of threads that have already exited. When all active threads exit the loop, this register is used to restore the exec mask, and the execution continues for the post loop code. When there is a "continue" in the loop, the compiler made a mistake to reset the register to 0 when the "continue" backedge is taken. This will result in some threads not executing the post loop code as they are supposed to. This patch fixed the issue. Reviewers: nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D59312 llvm-svn: 356298
* [AMDGPU] Fix SGPR fixing through SCC chainingMichael Liao2019-03-152-12/+18
| | | | | | | | | | | | | | | | | | Summary: - During the fixing of SGPR copying from VGPR, ensure users of SCC is properly propagated, i.e. * only propagate through live def of SCC, * skip the SCC-def inst itself, and * stop the propagation on the other SCC-def inst after checking its SCC-use first. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59362 llvm-svn: 356258
* MIR: Allow targets to serialize MachineFunctionInfoMatt Arsenault2019-03-146-7/+188
| | | | | | | | | | | | | | | | | | This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. llvm-svn: 356215
* AMDGPU: Correct type for waitcnt debug flagMatt Arsenault2019-03-141-2/+2
| | | | llvm-svn: 356206
* AMDGPU: Scavenge register instead of findUnusedRegMatt Arsenault2019-03-141-1/+1
| | | | llvm-svn: 356149
* AMDGPU: Don't add unnecessary convergent attributesMatt Arsenault2019-03-141-19/+2
| | | | | | These are redundant with the intrinsic declaration. llvm-svn: 356143
* [AMDGPU] Silence gcc 7 warningsStanislav Mekhanoshin2019-03-135-38/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D59330 llvm-svn: 356100
* [AMDGPU] Switched HSA metadata to use MsgPackDocumentTim Renouf2019-03-134-132/+116
| | | | | | | | | | | | Summary: MsgPackDocument is the lighter-weight replacement for MsgPackTypes. This commit switches AMDGPU HSA metadata processing to use MsgPackDocument instead of MsgPackTypes. Differential Revision: https://reviews.llvm.org/D57024 Change-Id: I0751668013abe8c87db01db1170831a76079b3a6 llvm-svn: 356081
* IR: Add immarg attributeMatt Arsenault2019-03-121-27/+11
| | | | | | | | | | | | | | | | | This indicates an intrinsic parameter is required to be a constant, and should not be replaced with a non-constant value. Add the attribute to all AMDGPU and generic intrinsics that comments indicate it should apply to. I scanned other target intrinsics, but I don't see any obvious comments indicating which arguments are intended to be only immediates. This breaks one questionable testcase for the autoupgrade. I'm unclear on whether the autoupgrade is supposed to really handle declarations which were never valid. The verifier fails because the attributes now refer to a parameter past the end of the argument list. llvm-svn: 355981
* [AMDGPU] Add support for immediate operand for S_ENDPGMDavid Stuttard2019-03-129-10/+81
| | | | | | | | | | | | | | | | | Summary: Add support for immediate operand in S_ENDPGM Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6 Reviewers: alexshap Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59213 llvm-svn: 355902
* Use bitset for assembler predicatesStanislav Mekhanoshin2019-03-113-8/+11
| | | | | | | | | | | | | | AMDGPU target run out of Subtarget feature flags hitting the limit of 64. AssemblerPredicates uses at most uint64_t for their representation. At the same time CodeGen has exhausted this a long time ago and switched to a FeatureBitset with the current limit of 192 bits. This patch completes transition to the bitset for feature bits extending it to asm matcher and MC code emitter. Differential Revision: https://reviews.llvm.org/D59002 llvm-svn: 355839
* [AMDGPU] Mark enum types in SIDefines.h as unsignedStanislav Mekhanoshin2019-03-114-21/+21
| | | | | | | | MSVC issues some warnings about signed/unsigned comparison. Differential Revision: https://reviews.llvm.org/D59171 llvm-svn: 355836
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-0810-195/+348
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* DAG: Don't try to cluster loads with tied inputsMatt Arsenault2019-03-081-45/+0
| | | | | | | | | | | | | | | | | | | | | This avoids breaking possible value dependencies when sorting loads by offset. AMDGPU has some load instructions that write into the high or low bits of the destination register, and have a tied input for the other input bits. These can easily have the same base pointer, but be a swizzle so the high address load needs to come first. This was inserting glue forcing the opposite ordering, producing a cycle the InstrEmitter would assert on. It may be potentially expensive to look for the dependency between the other loads, so just skip any where this could happen. Fixes bug 40936 by reverting r351379, which added a hacky attempt to fix this by adding chains in this case, which I think was just working around broken glue before the InstrEmitter. The core of the patch is re-implementing the fix for that problem. llvm-svn: 355728
* AMDGPU: Don't bother checking the chain in areLoadsFromSameBasePtrMatt Arsenault2019-03-081-15/+0
| | | | | | | This is only called in contexts that are verifying the chain itself, and the query itself is only asking about the address. llvm-svn: 355723
* AMDGPU: Correct DS implementation of areLoadsFromSameBasePtrMatt Arsenault2019-03-081-4/+4
| | | | | | | | | This was checking the wrong operands for the base register and the offsets. The indexes are shifted by the number of output registers from the machine instruction definition, and the chain is moved to the end. llvm-svn: 355722
* [AMDGPU] V_CVT_F32_UBYTE{0,1,2,3} are full rate instructionsCarl Ritson2019-03-081-3/+4
| | | | | | | | | | | | | | | | Summary: Fix a bug in the scheduling model where V_CVT_F32_UBYTE{0,1,2,3} are incorrectly marked as quarter rate instructions. Reviewers: arsenm, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59091 llvm-svn: 355671
* AMDHSA: Code object v3 updatesKonstantin Zhuravlyov2019-03-071-4/+13
| | | | | | | | | - Copy kernel symbol attributes into kernel descriptor attributes - Make sure kernel symbol's visibility is not "higher" than protected Differential Revision: https://reviews.llvm.org/D59057 llvm-svn: 355630
* AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV)Aakanksha Patil2019-03-072-6/+66
| | | | | | | | | | A previous patch for "uniform-work-group-size" attribute was found to break some RADV and possibly radeon SI tests and had to be retracted. This patch fixes that. Differential Revision: http://reviews.llvm.org/D58993 llvm-svn: 355574
* [AMDGPU] Add support for 64 bit buffer atomic artihmetic instructionsRyan Taylor2019-03-062-23/+31
| | | | | | | | | | | | | | | | Summary: This adds support for 64 bit buffer atomic arithmetic instructions but does not include cmpswap as that depends on a fix to the way the register pairs are handled Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58918 llvm-svn: 355520
* AMDGPU: Preserve undef flag when expanding SI_IFMatt Arsenault2019-03-051-2/+2
| | | | | | Fixes undefined value verifier error. llvm-svn: 355426
* [AMDGPU] Fix DPP operand order in atomic optimizerCarl Ritson2019-03-051-1/+1
| | | | | | | | | | | | | | | | | | | Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions. Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30 Reviewers: sheredom, tpr Reviewed By: sheredom Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58900 llvm-svn: 355394
* [AMDGPU] Omit KILL instructions from hazard recognizerDavid Stuttard2019-03-051-3/+2
| | | | | | | | | | | | | | | | | | Summary: In some cases the KILL was causing a hazard to be introduced as these were scheduled into hazard slots, but don't result in an instruction. KILL shouldn't be considered for hazard recognition. Change-Id: Ib6d2a2160f8c94cd0ce611ab198c7e4f46aeffcf Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58898 llvm-svn: 355384
* [AMDGPU] Implement AMDGPUMCInstrAnalysisScott Linder2019-03-052-0/+33
| | | | | | | | | Implement MCInstrAnalysis for AMDGPU, with default implementations save for `evaluateBranch`. Differential Revision: https://reviews.llvm.org/D58400 llvm-svn: 355373
* [AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, ↵Dmitry Preobrazhensky2019-03-047-47/+110
| | | | | | | | | | | | v_readlane_b32 and v_writelane_b32 See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58713 llvm-svn: 355312
* [AMDGPU] Mark ds instructions as meybeAtomicStanislav Mekhanoshin2019-03-011-0/+1
| | | | | | | | | | | | These were not recognized as potential atomics by memory legalizer. The test was working not because legalizer did a right thing, but because it has skipped all these instructions. When I have fixed DS desciption test started to fail because region address has changed from 4 to 2 a while ago. Differential Revision: https://reviews.llvm.org/D58802 llvm-svn: 355179
* AMDGPU/GlobalISel: Implement select for G_INSERTTom Stellard2019-03-012-0/+31
| | | | | | | | | | | | Re-commit r344310. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53116 llvm-svn: 355159
* AMDGPU/GlobalISel: Implement select for G_EXTRACTTom Stellard2019-02-283-0/+32
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49714 llvm-svn: 355156
* AMDGPU: Fix typoMatt Arsenault2019-02-281-1/+1
| | | | llvm-svn: 355056
OpenPOWER on IntegriCloud