summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Adjust the chain for loads writing to the HI part of a register.Changpeng Fang2019-01-161-0/+141
| | | | | | | | | | | | | | Summary: For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part of the register to maintain the appropriate order. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D56454 llvm-svn: 351379
* AMDGPU: Add llvm.amdgcn.ds.ordered.add & swapMarek Olsak2019-01-162-0/+141
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351
* AMDGPU: Raise the priority of MAD24 in instruction selection.Changpeng Fang2019-01-151-0/+26
| | | | | | | | | | | | | | | | | Summary: We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly estimate register pressure during instruction selection, we can give mad a higher priority. In this work, we raise the priority for mad24 in selection and resolve the performance regression. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D56745 llvm-svn: 351273
* Remove irrelevant references to legacy git repositories fromJames Y Knight2019-01-151-2/+2
| | | | | | | | | compiler identification lines in test-cases. (Doing so only because it's then easier to search for references which are actually important and need fixing.) llvm-svn: 351200
* AMDGPU: Add a fast path for icmp.i1(src, false, NE)Marek Olsak2019-01-151-0/+18
| | | | | | | | | | | | | | | | | Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150
* [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd tryDavid Stuttard2019-01-145-16/+685
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
* [AMDGPU] Fix dwordx3/southern-islands failures.Neil Henning2019-01-106-24/+46
| | | | | | | | | | | This commit fixes the dwordx3/southern-islands failures that were found in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not generating the dwordx3 variants of load/store instructions that were added to the ISA after southern islands. Differential Revision: https://reviews.llvm.org/D56434 llvm-svn: 350838
* Revert "[AMDGPU] Fix DPP combiner"Valery Pykhtin2019-01-093-525/+328
| | | | | | This reverts commit e3e2923a39cbec3b3bc3a7d3f0e9a77a4115080e, svn revision rL350721 llvm-svn: 350730
* [AMDGPU] Fix DPP combinerValery Pykhtin2019-01-093-328/+525
| | | | | | | | | | | | | | Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now. Test is fully reworked, added comments. Other fixes: 1. dpp move with uses and old reg initializer should be in the same BB. 2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity. 3. Added add, subrev, and, or instructions to the old folding function. 4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user. Differential revision: https://reviews.llvm.org/D55444 llvm-svn: 350721
* RegisterCoalescer: Assume CR_Replace for SubRangeJoinMatt Arsenault2019-01-081-0/+118
| | | | | | | | | | | | | | | | | Currently it's possible for following check on V.WriteLanes (which is not really meaningful during SubRangeJoin) to pass for one half of the pair, and then fall through to to one of the impossible or unresolved states. This then fails as inconsistent on the other half. During the main range join, the check between V.WriteLanes and OtherV.ValidLanes must have passed, meaning this should be a CR_Replace. Fixes most of the testcases in bugs 39542 and 39602 llvm-svn: 350678
* RegisterCoalescer: Defer clearing implicit_def lanesMatt Arsenault2019-01-081-0/+57
| | | | | | | | | | | | | | We can't go back and recover the lanes if it turns out the implicit_def really can't be erased. Assume all lanes are valid if an unresolved conflict is encountered. There aren't any tests where this seems to matter either way, but this seems like a safer option. Fixes bug 39602 llvm-svn: 350676
* AMDGPU/GlobalISel: Introduce vcc reg bankMatt Arsenault2019-01-0814-86/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not entirely sure this is the correct thing to do with the global isel philosophy, but I think this is necessary to handle how differently SGPRs are used normally vs. from a condition. For example, it makes sense to allow a copy from a VGPR to an SGPR, but it makes no sense to allow a copy from VGPRs to SGPRs used as select mask. This avoids regbankselecting strange code with a truncate feeding directly into a condition field. Now a copy is forced from sgpr(s1) to vcc, which is more sensible to handle. Some of these issues could probably avoided with making enough operations resulting in i1 illegal. I think we can't avoid this register bank for legality. For example, an i1 and where one source is from a truncate, and one source is a compare needs some kind of copy inserted to make sure both are in condition registers. llvm-svn: 350611
* AMDGPU/GlobalISel: Legalize concat_vectorsMatt Arsenault2019-01-081-0/+129
| | | | llvm-svn: 350598
* RegBankSelect: Fix copy insertion point for terminatorsMatt Arsenault2019-01-082-0/+203
| | | | | | | | | | | | | | | If a copy was needed to handle the condition of brcond, it was being inserted before the defining instruction. Add tests for iterator edge cases. I find the existing code here suspect for the case where it's looking for terminators that modify the register. It's going to insert a copy in the middle of the terminators, which isn't allowed (it might be necessary to have a COPY_terminator if anybody actually needs this). Also legalize brcond for AMDGPU. llvm-svn: 350595
* AMDGPU/GlobalISel: Disallow VGPR->SCC copiesMatt Arsenault2019-01-084-8/+16
| | | | | | | This fixes using scalar adds when only the carry in is a VGPR using greedy regbankselect. llvm-svn: 350593
* AMDGPU/GlobalISel: RegBankSelect for carry-inMatt Arsenault2019-01-084-0/+595
| | | | | | | | I'm not sure we should be allowing the truncate to s1 for the inputs. It may be necessary to create a new VCC reg bank. llvm-svn: 350592
* AMDGPU/GlobalISel: RegBankSelect for add/sub with carry outMatt Arsenault2019-01-084-0/+275
| | | | llvm-svn: 350589
* AMDGPU/GlobalISel: InstrMapping for G_UNMERGE_VALUESMatt Arsenault2019-01-081-0/+38
| | | | llvm-svn: 350588
* [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes ↵Craig Topper2019-01-071-1/+1
| | | | | | | | | | | | | | a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560
* AMDGPU: test for uniformity of branch instruction, not its conditionRhys Perry2019-01-071-0/+97
| | | | | | | | | | | | | | | | | Summary: If a divergent branch instruction is marked as divergent by propagation rule 2 in DivergencePropagator::exploreSyncDependency() and its condition is uniform, that branch would incorrectly be assumed to be uniform. Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56331 llvm-svn: 350532
* AMDGPU: Remove VS/SV mappings from selectMatt Arsenault2019-01-071-101/+69
| | | | | | These would violate the constant bus restriction llvm-svn: 350517
* Regenerate test.Simon Pilgrim2019-01-071-64/+390
| | | | | | Prep work towards enabling SimplifyDemandedBits vector support for TRUNCATE as discussed on D56118. llvm-svn: 350513
* Added single use check to ShrinkDemandedConstantStanislav Mekhanoshin2019-01-051-0/+20
| | | | | | | | | Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink source node to demanded bits only even if there are other uses. Differential Revision: https://reviews.llvm.org/D56289 llvm-svn: 350475
* [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.Alexander Timofeev2019-01-031-10/+10
| | | | | | | | | | | | | | | | | | | | | | Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that changes def-use iteratorson the way. As a result loop exits immediately when def-use iterator is changed. Hence, the operand is folded to the very first use instruction only. This makes VGPR live along the whole basic block and increases register pressure significantly. The performance drop observed in SHOC DeviceMemory test is caused by this bug. Proposed fix: collect uses to separate container for further processing in another loop. Testing: make check-llvm SHOC performance test. Reviewers: rampitec, ronlieb Differential Revision: https://reviews.llvm.org/D56161 llvm-svn: 350350
* [AMDGPU] Change section name with metadata accessPiotr Sobczak2019-01-031-12/+12
| | | | | | | | | | | | | | | | | | | | | | | Summary: The commit rL348922 introduced a means to set Metadata section kind for a global variable, if its explicit section name was prefixed with ".AMDGPU.metadata.". This patch changes that prefix to ".AMDGPU.comment.", as "metadata" in the section name might lead to ambiguity with metadata used by AMD PAL runtime. Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668 Reviewers: kzhuravl Reviewed By: kzhuravl Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56197 llvm-svn: 350292
* [AMDGPU] Handle OR as operand of raw load/storePiotr Sobczak2019-01-022-8/+88
| | | | | | | | | | | | | | | | | | Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208
* [AMDGPU] Regenerate i64 shift tests.Simon Pilgrim2018-12-261-19/+86
| | | | | | To show codegen diff due to a future SimplifyDemandedBits patch. llvm-svn: 350065
* [DAGCombiner] allow narrowing of add followed by truncateSanjay Patel2018-12-222-5/+5
| | | | | | | | | | | | | | | trunc (add X, C ) --> add (trunc X), C' If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type. This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine). This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing. Differential Revision: https://reviews.llvm.org/D55866 llvm-svn: 350006
* AMDGPU: Don't peel of the offset if the resulting base could possibly be ↵Changpeng Fang2018-12-213-33/+87
| | | | | | | | | | | | | | | | | | | negative in Indirect addressing. Summary: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. This is because the M0 field is of unsigned. This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization if the base is known unsigned. Reviewers: arsemn Differential Revision: https://reviews.llvm.org/D55568 llvm-svn: 349951
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.voteMatt Arsenault2018-12-211-0/+56
| | | | llvm-svn: 349882
* AMDGPU/GlobalISel: RegBankSelect for some fp opsMatt Arsenault2018-12-216-0/+174
| | | | llvm-svn: 349880
* AMDGPU/GlobalISel: Redo legality for build_vectorMatt Arsenault2018-12-211-0/+585
| | | | | | | | | | It seems better to avoid using the callback if possible since there are coverage assertions which are disabled if this is used. Also fix missing tests. Only test the legal cases since it seems legalization for build_vector is quite lacking. llvm-svn: 349878
* AMDGPU: Make i1/i64/v2i32 and/or/xor legalMatt Arsenault2018-12-206-30/+335
| | | | | | | The 64-bit types do depend on the register bank, but that's another issue to deal with later. llvm-svn: 349716
* AMDGPU/GlobalISel: Fix ValueMapping tables for i1Matt Arsenault2018-12-201-2/+57
| | | | | | | This was incorrectly selecting SGPR for any i1 values, e.g. G_TRUNC to i1 from a VGPR was still an SGPR. llvm-svn: 349715
* AMDGPU/GlobalISel: RegBankSelect for fp conversionsMatt Arsenault2018-12-204-0/+110
| | | | llvm-svn: 349709
* AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchgMatt Arsenault2018-12-2024-0/+1395
| | | | llvm-svn: 349708
* AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcastsRhys Perry2018-12-191-0/+35
| | | | | | | | | | | | Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55058 llvm-svn: 349694
* AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1Nicolai Haehnle2018-12-191-0/+4
| | | | | | | | | | | | | | | | | | | Summary: Using HI here makes no logical sense, since the dword is only 32 bits to begin with. Current Mesa master does not look at the relocation type at all, so this change is fine. Future Mesa will rely on this, however. Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55369 llvm-svn: 349620
* AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are mergedCarl Ritson2018-12-191-8/+64
| | | | | | | | | | | | | | | | | | Summary: Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged. This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds. Irreducible loop test has been extended based on a CTS failure detected for GFX9. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D55602 llvm-svn: 349611
* AMDGPU/GlobalISel: Regbankselect for fsubMatt Arsenault2018-12-191-0/+69
| | | | llvm-svn: 349608
* [AMDGPU] Removed the unnecessary operand size-check-assert from ↵Farhana Aleen2018-12-181-0/+36
| | | | | | | | | | processBaseWithConstOffset(). Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert. Author: FarhanaAleen llvm-svn: 349529
* AMDGPU: Legalize/regbankselect frame_indexMatt Arsenault2018-12-181-0/+23
| | | | llvm-svn: 349468
* AMDGPU: Legalize/regbankselect fmaMatt Arsenault2018-12-182-0/+183
| | | | llvm-svn: 349467
* AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsubMatt Arsenault2018-12-185-0/+156
| | | | llvm-svn: 349463
* [AMDGPU] Promote constant offset to the immediate by finding a new base with ↵Farhana Aleen2018-12-142-0/+639
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 13bit constant offset from the nearby instructions. Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions. E.g. s_movk_i32 s0, 0x1800 v_add_co_u32_e32 v0, vcc, s0, v2 v_addc_co_u32_e32 v1, vcc, 0, v6, vcc s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[0:1], off => s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[5:6], off offset:2048 Author: FarhanaAleen Reviewed By: arsenm, rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D55539 llvm-svn: 349196
* [RegAllocGreedy] IMPLICIT_DEF values shouldn't prefer registersJohn Brawn2018-12-141-6/+15
| | | | | | | | | | | It costs nothing to spill an IMPLICIT_DEF value (the only spill code that's generated is a KILL of the value), so when creating split constraints if the live-out value is IMPLICIT_DEF the exit constraint should be DontCare instead of PrefReg. Differential Revision: https://reviews.llvm.org/D55652 llvm-svn: 349151
* Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attributeAakanksha Patil2018-12-137-198/+24
| | | | | | This patch breaks RADV (and probably RadeonSI as well) llvm-svn: 349084
* AMDGPU/GlobalISel: Legalize/regbankselect block_addrMatt Arsenault2018-12-132-0/+57
| | | | llvm-svn: 349081
* AMDGPU/GlobalISel: Legalize f64 fadd/fmulMatt Arsenault2018-12-132-2/+28
| | | | llvm-svn: 349014
* AMDGPU/GlobalISel: RegBankSelect some simple operationsMatt Arsenault2018-12-139-0/+279
| | | | llvm-svn: 349012
OpenPOWER on IntegriCloud