summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Add more test cases of D59608.Michael Liao2019-04-023-0/+110
| | | | | | | | | | | | | | Summary: - Add more test cases. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60071 llvm-svn: 357442
* [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.Neil Henning2019-04-019-206/+197
| | | | | | | | | | | | | | | | | | | | | | | This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400
* AMDGPU: Remove dx10-clamp from subtarget featuresMatt Arsenault2019-03-293-4/+214
| | | | | | | | | | | | | | | | | | Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302
* [DAGCombine] Prune unnused nodes.Nirav Dave2019-03-2913-713/+606
| | | | | | | | | | | | | | | | | | | Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283
* AMDGPU: Make sram-ecc off by default for Vega20Konstantin Zhuravlyov2019-03-296-10/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D59718 llvm-svn: 357247
* AMDGPU/GlobalISel: Insert waterfall loop for vector indexingMatt Arsenault2019-03-291-12/+91
| | | | | | | | The register index can only really be an SGPR. Lie that a VGPR index is legal, and then rewrite the instruction in a waterfall loop to handle the index. llvm-svn: 357235
* AMDGPU: Make exec mask optimzations more resistant to block splitsMatt Arsenault2019-03-281-9/+116
| | | | | | | Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170
* [SelectionDAG] Add 2 tests for selection across basic blocksPiotr Sobczak2019-03-281-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add tests for selection across basic block boundary: * one test containing a buffer load, where part of the offset computation is placed in the predecessor of the load * similar test, but containing two buffer loads and shared computations Please note that the behaviour being tested will be updated in a subsequent commit. This commit was extracted from https://reviews.llvm.org/D59535. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: jvesely, nhaehnle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59690 llvm-svn: 357149
* [LegalizeVectorTypes] Allow single loads and stores for more short vectorsJustin Bogner2019-03-273-27/+29
| | | | | | | | | | | | | | | | | | | When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable. (See https://reviews.llvm.org/rL236528 for reference.) This applies that behaviour to vector types. If the vector type is TypePromoteInteger, the element type is going to be TypePromoteInteger as well, which will lead to have a single promoting load rather than N individual promoting loads. For instance, if we have a v3i1, we would now have a load of v4i1 instead of 3 loads of i1. Patch by Guillaume Marques. Thanks! Differential Revision: https://reviews.llvm.org/D56201 llvm-svn: 357120
* RegPressure: Fix crash on blocks with only dbg_valueMatt Arsenault2019-03-271-0/+135
| | | | | | | | If there were only dbg_values in the block, recede would hit the beginning of the block and try to use thet dbg_value as a real instruction. llvm-svn: 357105
* [GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead ↵Amara Emerson2019-03-272-8/+5
| | | | | | | | | | | | | | | | | | | | instructions. The artifact combiners push instructions which have been marked for deletion onto an list for the legalizer to deal with on return. However, for trunc(ext) combines the combiner routine recursively calls itself. When it does this the dead instructions list may not be empty, and the other combiners don't expect to be dealing with essentially invalid MIR (multiple vreg defs etc). This change fixes it by ensuring that the dead instructions are processed on entry into tryCombineInstruction. As a result, this fix exposed a few places in tests where G_TRUNC instructions were not being deleted even though they were dead. Differential Revision: https://reviews.llvm.org/D59892 llvm-svn: 357101
* Reapply "AMDGPU: Scavenge register instead of findUnusedReg"Matt Arsenault2019-03-271-0/+44
| | | | | | | | | | | | | This reapplies r356149, using the correct overload of findUnusedReg which passes the current iterator. This worked most of the time, because the scavenger iterator was moved at the end of the frame index loop in PEI. This would fail if the spill was the first instruction. This was further hidden by the fact that the scavenger wasn't passed in for normal frame index elimination. llvm-svn: 357098
* AMDGPU: Add testcase I meant to merge into r357093Matt Arsenault2019-03-271-0/+37
| | | | llvm-svn: 357097
* [PeepholeOpt] Don't stop simplifying copies on sequence of subregsQuentin Colombet2019-03-271-0/+34
| | | | | | | | | | | | | | | This patch removes an overly conservative check that would prevent simplifying copies when the value we were tracking would go through several subregister indices. Indeed, the intend of this check was to not track values whenever we have to compose subregister, but actually what the check was doing was bailing anytime we see a second subreg, even if that second subreg would actually be the new source of truth (as opposed to a part of that subreg). Differential Revision: https://reviews.llvm.org/D59891 llvm-svn: 357095
* AMDGPU: Enable the scavenger for large framesMatt Arsenault2019-03-271-14/+10
| | | | | | | Another test is needed for the case where the scavenge fail, but there's another issue with that which needs an additional fix. llvm-svn: 357093
* AMDGPU: Add additional MIR tests for exec mask optimizationsMatt Arsenault2019-03-273-7/+726
| | | | | | | | | | Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. llvm-svn: 357091
* AMDGPU: Skip debug_instr when collapsing end_cfMatt Arsenault2019-03-271-0/+110
| | | | | | | Based on how these are inserted, I doubt this was causing a problem in practice. llvm-svn: 357090
* AMDGPU: Fix missing scc implicit def on s_andn2_b64_termMatt Arsenault2019-03-274-21/+21
| | | | | | | Introduce new helper class to copy properties directly from the base instruction. llvm-svn: 357089
* MIR: Freeze reserved regs after parsing everythingMatt Arsenault2019-03-274-7/+20
| | | | | | | | | | | | The AMDGPU implementation of getReservedRegs depends on MachineFunctionInfo fields that are parsed from the YAML section. This was reserving the wrong register since it was setting the reserved regs before parsing the correct one. Some tests were relying on the default reserved set for the assumed default calling convention. llvm-svn: 357083
* AMDGPU: wave_barrier is not isBarrierMatt Arsenault2019-03-271-0/+12
| | | | | | | This is not a control flow instruction, so should not be marked as isBarrier. This fixes a verifier error if followed by unreachable. llvm-svn: 357081
* AMDGPU: Fix areLoadsFromSameBasePtr for DS atomicsMatt Arsenault2019-03-271-0/+17
| | | | | | The offset operand index is different for atomics. llvm-svn: 357073
* AMDGPU: Make collapse-endcf test more usefulMatt Arsenault2019-03-251-6/+20
| | | | | | | | Without a VALU instruction in the return block, these were mostly testing the path to delete exec mask code before s_endpgm rather than the end cf handling. llvm-svn: 356955
* AMDGPU: Add support for cross address space synchronization scopesKonstantin Zhuravlyov2019-03-2515-402/+2517
| | | | | | Differential Revision: https://reviews.llvm.org/D59517 llvm-svn: 356946
* MISched: Don't schedule regions with 0 instructionsMatt Arsenault2019-03-251-0/+115
| | | | | | | | | | | | | | | | | I think this is correct, but may not necessarily be the correct fix for the assertion I'm really trying to solve. If a scheduling region was found that only has dbg_value instructions, the RegPressure tracker would end up in an inconsistent state because it would skip over any debug instructions and point to an instruction outside of the scheduling region. It may still be possible for this to happen if there are some real schedulable instructions between dbg_values, but I haven't managed to break this. The testcase is extremely sensitive and I'm not sure how to make it more resistent to future scheduler changes that would avoid stressing this situation. llvm-svn: 356926
* [AMDGPU] Use three- and five-dword result type in image opsTim Renouf2019-03-223-18/+18
| | | | | | | | | | | | | | | | Some image ops return three or five dwords. Previously, we modeled that with a 4 or 8 dword register class. The register allocator could cleverly spot that some subregs were dead and allocate something else there, but that caused the de-optimization that waitcnt insertion would think that the result was used immediately. This commit allows such an image op to have a result with a three or five dword result, avoiding the above de-optimization. Differential Revision: https://reviews.llvm.org/D58905 Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b llvm-svn: 356757
* [AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsicsTim Renouf2019-03-224-0/+188
| | | | | | | | | | | | | | | Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755
* [AMDGPU] Added v5i32 and v5f32 register classesTim Renouf2019-03-222-0/+77
| | | | | | | | | | They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
* GlobalISel: Fix RegBankSelect for REG_SEQUENCEMatt Arsenault2019-03-211-0/+140
| | | | | | | | | | | | | The AArch64 test was broken since the result register already had a set register class, so this test was a no-op. The mapping verify call would fail because the result size is not the same as the inputs like in a copy or phi. The AMDGPU testcases are half broken and introduce illegal VGPR->SGPR copies which need much more work to handle correctly (same for phis), but add them as a baseline. llvm-svn: 356713
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-2119-111/+384
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* Allow machine dce to remove uses in the same instructionStanislav Mekhanoshin2019-03-201-0/+55
| | | | | | | | | | | | | | | | Machine DCE cannot remove a dead definition if there are non-dbg uses. A use however can be in the same instruction: dead %0 = INST %0 Such instructions sometimes created by Detect dead lanes pass. Allow this instruction to be deleted despite the use if the only use belongs to the same instruction. Differential Revision: https://reviews.llvm.org/D59565 llvm-svn: 356619
* AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselectMatt Arsenault2019-03-202-75/+134
| | | | | | | | | The constantness shouldn't change the register bank choice. We also don't need to restrict this to only indexing VGPRs, since it's possible to index SGPRs (but SelectionDAG made using this difficult). Allow directly indexing SGPRs when appropriate. llvm-svn: 356611
* [AMDGPU] Fix clamp bit DAG operandMichael Liao2019-03-201-0/+22
| | | | | | | | | | | | | | | | | | Summary: - Should use `targetconstant` instead of `constant` operand for clamp bit, which is expected as an immediate operand. Under certain conditions, such as a common `i1 false` constant is used in other place and selected before the instruction with clamp bit, register operand may be added instead of immediate one. Use `targetcosntant` to enforce that. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59608 llvm-svn: 356608
* [AMDGPU] Added MsgPack format PAL metadataTim Renouf2019-03-208-0/+157
| | | | | | | | | | | | | | Summary: PAL metadata now supports both the old linear reg=val pairs format and the new MsgPack format. The MsgPack format uses YAML as its textual representation. On output to YAML, a mnemonic name is provided for some hardware registers. Differential Revision: https://reviews.llvm.org/D57028 Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94 llvm-svn: 356591
* [AMDGPU] Factored PAL metadata handling out into its own classTim Renouf2019-03-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: This commit introduces a new AMDGPUPALMetadata class that: * is inside the AMDGPU target; * keeps an in-memory representation of PAL metadata; * provides a method to read the frontend-supplied metadata from LLVM IR; * provides methods for the asm printer to set metadata items; * provides methods to write the metadata as a binary blob to put in a .note record or as an asm directive; * provides a method to read the metadata as a binary blob from a .note record. Because llvm-readobj cannot call directly into a target, I had to remove llvm-readobj's ability to dump PAL metadata, pending a resolution to https://reviews.llvm.org/D52821 Differential Revision: https://reviews.llvm.org/D57027 Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64 llvm-svn: 356582
* [AMDGPU] Allow MIMG with no uses in adjustWritemask in iselDavid Stuttard2019-03-201-0/+22
| | | | | | | | | | | | | | | | | | | Summary: If an MIMG instruction has managed to get through to adjustWritemask in isel but has no uses (and doesn't enable TFC) then prevent an assertion by not attempting to adjust the writemask. The instruction will be removed anyway. Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58964 llvm-svn: 356540
* [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsicsRyan Taylor2019-03-196-0/+383
| | | | | | | | | | | | | | | Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465
* [AMDGPU] Ban i8 min3 promotion.Neil Henning2019-03-193-0/+370
| | | | | | | | | | | | | | | | I found this really weird WWM-related case whereby through the WWM transformations our isel lowering was trying to promote 2 min's into a min3 for the i8 type, which our hardware doesn't support. The new min3_i8.ll test case would previously spew the error: PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68 Before the simple fix to our isel lowering to not do it for i8 MVT's. Differential Revision: https://reviews.llvm.org/D59543 llvm-svn: 356464
* [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`.Michael Liao2019-03-181-0/+6
| | | | | | | | | | Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59501 llvm-svn: 356405
* Fix flat-error-unsupported-gpu-hsa testAlexandre Ganea2019-03-181-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D59505 llvm-svn: 356400
* [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmeticTim Renouf2019-03-1813-157/+157
| | | | | | | | | | | | | | Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399
* [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiersTim Renouf2019-03-1812-68/+68
| | | | | | | | | | | | | | | | | This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398
* [AMDGPU] Add an experimental buffer fat pointer address space.Neil Henning2019-03-183-3/+59
| | | | | | | | | | | | Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 llvm-svn: 356373
* AMDGPU: Partially fix default device for HSAMatt Arsenault2019-03-172-1/+19
| | | | | | | | | | | | | | | | | | There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347
* [AMDGPU] Prepare for introduction of v3 and v5 MVTsTim Renouf2019-03-174-4/+417
| | | | | | | | | | | | | | | | | | | AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342
* [AMDGPU] Regenerate some f16/i16 tests.Simon Pilgrim2019-03-173-231/+1224
| | | | | | Prep work for D51589 llvm-svn: 356340
* AMDGPU: Fix a SIAnnotateControlFlow issue when there are multiple backedges.Changpeng Fang2019-03-151-0/+61
| | | | | | | | | | | | | | | | | | | | | | Summary: At the exit of the loop, the compiler uses a register to remember and accumulate the number of threads that have already exited. When all active threads exit the loop, this register is used to restore the exec mask, and the execution continues for the post loop code. When there is a "continue" in the loop, the compiler made a mistake to reset the register to 0 when the "continue" backedge is taken. This will result in some threads not executing the post loop code as they are supposed to. This patch fixed the issue. Reviewers: nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D59312 llvm-svn: 356298
* [AMDGPU] Fix SGPR fixing through SCC chainingMichael Liao2019-03-152-0/+32
| | | | | | | | | | | | | | | | | | Summary: - During the fixing of SGPR copying from VGPR, ensure users of SCC is properly propagated, i.e. * only propagate through live def of SCC, * skip the SCC-def inst itself, and * stop the propagation on the other SCC-def inst after checking its SCC-use first. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59362 llvm-svn: 356258
* MIR: Allow targets to serialize MachineFunctionInfoMatt Arsenault2019-03-146-1/+44
| | | | | | | | | | | | | | | | | | This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. llvm-svn: 356215
* GlobalISel: Use multiple returns for intrinsic structsMatt Arsenault2019-03-141-0/+27
| | | | | | | | | | | This is consistent with what SelectionDAG does and is much easier to work with than the extract sequence with an artificial wide register. For the AMDGPU control flow intrinsics, this was producing an s128 for the i64, i1 tuple return. Any legalization that should apply to a real s128 value would badly obscure the direct values that need to be seen. llvm-svn: 356147
* [AMDGPU] Switched HSA metadata to use MsgPackDocumentTim Renouf2019-03-136-1356/+1400
| | | | | | | | | | | | Summary: MsgPackDocument is the lighter-weight replacement for MsgPackTypes. This commit switches AMDGPU HSA metadata processing to use MsgPackDocument instead of MsgPackTypes. Differential Revision: https://reviews.llvm.org/D57024 Change-Id: I0751668013abe8c87db01db1170831a76079b3a6 llvm-svn: 356081
OpenPOWER on IntegriCloud