summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Widen small uaddo/usuboMatt Arsenault2019-01-261-1/+2
| | | | llvm-svn: 352294
* AMDGPU/GlobalISel: Remove leftover setActionMatt Arsenault2019-01-251-11/+8
| | | | | | Also move G_GEP actions together. llvm-svn: 352168
* AMDGPU/GlobalISel: Scalarize add/subMatt Arsenault2019-01-251-3/+1
| | | | llvm-svn: 352167
* GlobalISel: fewerElementsVector for more cast typesMatt Arsenault2019-01-251-3/+6
| | | | llvm-svn: 352166
* GlobalISel: fewerElementsVector for a few more trivial opsMatt Arsenault2019-01-251-5/+5
| | | | llvm-svn: 352165
* AMDGPU/GlobalISel: Legalize smulh/umulh and scalarize mulMatt Arsenault2019-01-252-1/+6
| | | | llvm-svn: 352162
* GlobalISel: Support fewerElementsVector for icmp/fcmpMatt Arsenault2019-01-251-6/+8
| | | | | | Also legalize 64-bit compares for AMDGPU llvm-svn: 352157
* GlobalISel: Implement fewerElementsVector for extensionsMatt Arsenault2019-01-251-2/+7
| | | | llvm-svn: 352155
* GlobalISel: Add convenience mutatations to scalarizeMatt Arsenault2019-01-251-29/+9
| | | | llvm-svn: 352143
* RegBankSelect: Support some more complex part mappingsMatt Arsenault2019-01-243-0/+208
| | | | llvm-svn: 352123
* [AMDGPU] With XNACK, cannot clause a load with result coalesced with operandTim Renouf2019-01-231-0/+11
| | | | | | | | | | | | | | | | | | Summary: With XNACK, an smem load whose result is coalesced with an operand (thus it overwrites its own operand) cannot appear in a clause, because some other instruction might XNACK and restart the whole clause. The clause breaker already realized that an smem that overwrites an operand cannot appear in a clause, and broke the clause. The problem that this commit fixes is that the SIFormMemoryClauses optimization formed a bundle with early clobber, which caused the earlier code that set up the coalesced operand to be removed as dead. Differential Revision: https://reviews.llvm.org/D57008 Change-Id: I703c4d5b0bf7d6060222bec491f45c18bb3c0016 llvm-svn: 351950
* AMDGPU/GlobalISel: Start selectively legalizing 16-bit operationsMatt Arsenault2019-01-221-4/+9
| | | | | | | | It might be a bit nicer to use the fancy .legalIf and co. predicates, but this was requiring more boilerplate and disables the coverage assertions. llvm-svn: 351886
* AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shiftsMatt Arsenault2019-01-222-2/+5
| | | | llvm-svn: 351884
* GlobalISel: Allow shift amount to be a different typeMatt Arsenault2019-01-221-0/+2
| | | | | | | | | For AMDGPU the shift amount is never 64-bit, and this needs to use a 32-bit shift. X86 uses i8, but seemed to be hacking around this before. llvm-svn: 351882
* GlobalISel: Implement widen for extract_vector_elt elt typeMatt Arsenault2019-01-221-3/+16
| | | | llvm-svn: 351871
* GlobalISel: Implement fewerElementsVector for basic FP opsMatt Arsenault2019-01-221-20/+28
| | | | llvm-svn: 351866
* AMDGPU/GlobalISel: Remove vectors from legal constant typesMatt Arsenault2019-01-221-1/+1
| | | | llvm-svn: 351859
* GlobalISel: Support narrowing zextload/sextloadMatt Arsenault2019-01-221-0/+18
| | | | llvm-svn: 351856
* Codegen support for atomicrmw fadd/fsubMatt Arsenault2019-01-227-17/+56
| | | | llvm-svn: 351851
* AMDGPU/GlobalISel: Legalize more fp<->int conversionsMatt Arsenault2019-01-221-10/+4
| | | | llvm-svn: 351767
* [AMDGPU] Fixed hazard recognizer to walk predecessorsStanislav Mekhanoshin2019-01-214-32/+126
| | | | | | | | | | | | | | | | | | | | Fixes two problems with GCNHazardRecognizer: 1. It only scans up to 5 instructions emitted earlier. 2. It does not take control flow into account. An earlier instruction from the previous basic block is not necessarily a predecessor. At the same time a real predecessor block is not scanned. The patch provides a way to distinguish between scheduler and hazard recognizer mode. It is OK to work with emitted instructions in the scheduler because we do not really know what will be emitted later and its order. However, when pass works as a hazard recognizer the schedule is already finalized, and we have full access to the instructions for the whole function, so we can properly traverse predecessors and their instructions. Differential Revision: https://reviews.llvm.org/D56923 llvm-svn: 351759
* AMDGPU: Legalize more bitcastsMatt Arsenault2019-01-201-5/+7
| | | | llvm-svn: 351700
* AMDGPU/GlobalISel: Really legalize exts from i1Matt Arsenault2019-01-201-1/+2
| | | | | | | | There is a combine that was hiding these tests not actually testing what they should be, although they were producing the expected end result. llvm-svn: 351698
* GlobalISel: Implement widenScalar for basic FP opsMatt Arsenault2019-01-201-6/+8
| | | | llvm-svn: 351696
* AMDGPU/GlobalISel: Legalize f32->f16 fptruncMatt Arsenault2019-01-201-1/+1
| | | | llvm-svn: 351695
* AMDGPU/GlobalISel: Fix some crashs in g_unmerge_values/g_merge_valuesMatt Arsenault2019-01-201-12/+73
| | | | | | | | | | | This was crashing in the predicate function assuming the value is a vector. Copy more of what AArch64 uses. This probably needs more refinement later, but I don't exactly understand what it means in some cases, particularly since any legalization for these seems to be missing. llvm-svn: 351693
* AMDGPU/GlobalISel: Regbank select for fpextMatt Arsenault2019-01-201-0/+1
| | | | llvm-svn: 351692
* AMDGPU/GlobalISel: Cleanup legality for extensionsMatt Arsenault2019-01-201-10/+6
| | | | llvm-svn: 351691
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-19207-828/+621
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU/GlobalISel: Legalize more types for selectMatt Arsenault2019-01-181-2/+4
| | | | llvm-svn: 351599
* AMDGPU/GlobalISel: Legalize illegal g_constantMatt Arsenault2019-01-181-4/+9
| | | | llvm-svn: 351596
* AMDGPU: Remove llvm.SI.load.constMatt Arsenault2019-01-1810-198/+0
| | | | | | | It's taken 3 years, but now all of the old AMDGPU and SI intrinsics are finally gone llvm-svn: 351586
* [AMDGPU] Add some missing always-uniform values.Neil Henning2019-01-181-0/+2
| | | | | | | | | This commit adds some missing intrinsics into the isAlwaysUniform list for the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D56845 llvm-svn: 351562
* [AMDGPU][MC][GFX8+][DISASSEMBLER] Corrected 1/2pi value for 64-bit operandsDmitry Preobrazhensky2019-01-181-1/+1
| | | | | | | | | | See bug 39332: https://bugs.llvm.org/show_bug.cgi?id=39332 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D56794 llvm-svn: 351555
* [AMDGPU][MC] Disabled use of 2 different literals with SOP2/SOPC instructionsDmitry Preobrazhensky2019-01-182-0/+41
| | | | | | | | | | See bug 39319: https://bugs.llvm.org/show_bug.cgi?id=39319 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D56847 llvm-svn: 351549
* AMDGPU: Adjust the chain for loads writing to the HI part of a register.Changpeng Fang2019-01-161-0/+45
| | | | | | | | | | | | | | Summary: For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part of the register to maintain the appropriate order. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D56454 llvm-svn: 351379
* AMDGPU: Add llvm.amdgcn.ds.ordered.add & swapMarek Olsak2019-01-1612-11/+119
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351
* AMDGPU: Raise the priority of MAD24 in instruction selection.Changpeng Fang2019-01-151-0/+2
| | | | | | | | | | | | | | | | | Summary: We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly estimate register pressure during instruction selection, we can give mad a higher priority. In this work, we raise the priority for mad24 in selection and resolve the performance regression. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D56745 llvm-svn: 351273
* AMDGPU: Add a fast path for icmp.i1(src, false, NE)Marek Olsak2019-01-152-0/+10
| | | | | | | | | | | | | | | | | Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150
* [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd tryDavid Stuttard2019-01-1411-54/+545
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
* [AMDGPU] Fix dwordx3/southern-islands failures.Neil Henning2019-01-102-4/+10
| | | | | | | | | | | This commit fixes the dwordx3/southern-islands failures that were found in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not generating the dwordx3 variants of load/store instructions that were added to the ISA after southern islands. Differential Revision: https://reviews.llvm.org/D56434 llvm-svn: 350838
* [opaque pointer types] Remove some calls to generic Type subtype accessors.James Y Knight2019-01-101-3/+3
| | | | | | | | | | | | That is, remove many of the calls to Type::getNumContainedTypes(), Type::subtypes(), and Type::getContainedType(N). I'm not intending to remove these accessors -- they are useful/necessary in some cases. However, removing the pointee type from pointers would potentially break some uses, and reducing the number of calls makes it easier to audit. llvm-svn: 350835
* [AMDGPU] Separate feature dot-instsStanislav Mekhanoshin2019-01-105-6/+22
| | | | | | Differential Revision: https://reviews.llvm.org/D56524 llvm-svn: 350793
* Revert "[AMDGPU] Fix DPP combiner"Valery Pykhtin2019-01-093-171/+68
| | | | | | This reverts commit e3e2923a39cbec3b3bc3a7d3f0e9a77a4115080e, svn revision rL350721 llvm-svn: 350730
* [AMDGPU] Fix DPP combinerValery Pykhtin2019-01-093-68/+171
| | | | | | | | | | | | | | Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now. Test is fully reworked, added comments. Other fixes: 1. dpp move with uses and old reg initializer should be in the same BB. 2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity. 3. Added add, subrev, and, or instructions to the old folding function. 4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user. Differential revision: https://reviews.llvm.org/D55444 llvm-svn: 350721
* Remove check for single use in ShrinkDemandedConstantStanislav Mekhanoshin2019-01-091-2/+1
| | | | | | | | | | | | | | | This removes check for single use from general ShrinkDemandedConstant to the BE because of the AArch64 regression after D56289/rL350475. After several hours of experiments I did not come up with a testcase failing on any other targets if check is not performed. Moreover, direct call to ShrinkDemandedConstant is not really needed and superceed by SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D56406 llvm-svn: 350684
* AMDGPU/GlobalISel: Introduce vcc reg bankMatt Arsenault2019-01-083-42/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not entirely sure this is the correct thing to do with the global isel philosophy, but I think this is necessary to handle how differently SGPRs are used normally vs. from a condition. For example, it makes sense to allow a copy from a VGPR to an SGPR, but it makes no sense to allow a copy from VGPRs to SGPRs used as select mask. This avoids regbankselecting strange code with a truncate feeding directly into a condition field. Now a copy is forced from sgpr(s1) to vcc, which is more sensible to handle. Some of these issues could probably avoided with making enough operations resulting in i1 illegal. I think we can't avoid this register bank for legality. For example, an i1 and where one source is from a truncate, and one source is a compare needs some kind of copy inserted to make sure both are in condition registers. llvm-svn: 350611
* AMDGPU/GlobalISel: Legalize concat_vectorsMatt Arsenault2019-01-081-0/+12
| | | | llvm-svn: 350598
* RegBankSelect: Fix copy insertion point for terminatorsMatt Arsenault2019-01-082-0/+28
| | | | | | | | | | | | | | | If a copy was needed to handle the condition of brcond, it was being inserted before the defining instruction. Add tests for iterator edge cases. I find the existing code here suspect for the case where it's looking for terminators that modify the register. It's going to insert a copy in the middle of the terminators, which isn't allowed (it might be necessary to have a COPY_terminator if anybody actually needs this). Also legalize brcond for AMDGPU. llvm-svn: 350595
* AMDGPU/GlobalISel: Disallow VGPR->SCC copiesMatt Arsenault2019-01-081-2/+8
| | | | | | | This fixes using scalar adds when only the carry in is a VGPR using greedy regbankselect. llvm-svn: 350593
OpenPOWER on IntegriCloud