summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Clean up update_llc_test_checks CodeGen testsScott Linder2019-10-245-12/+9
| | | | | | | | | | | | | | Summary: Some tests have been hand edited without removing the update_llc_test_checks header, some have slightly outdated CHECK lines which still pass, and some have additional comments which update_llc_test_checks pushes towards the function body. Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69402
* [GlobalISel][AArch64][AMDGPU][X86] Teach LegalizationArtifactCombiner to ↵Craig Topper2019-10-2410-375/+333
| | | | | | | | combine trunc(g_constant). This allows X86 to properly form shift by immediate instructions since we require an 8-bit constant to match the imported SelectionDAG patterns.
* [AMDGPU] Fix mfma scheduling crashStanislav Mekhanoshin2019-10-241-0/+34
| | | | | | | An SUnit can be neither intruction not SDNode. It is all null if represents a nop. Fixed a crash on using SU->getInstr(). Differential Revision: https://reviews.llvm.org/D69395
* [AMDGPU] Skip additional folding on the same operand.Michael Liao2019-10-242-0/+42
| | | | | | | | | | Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69355
* [AMDGPU] Allow folding of sgpr to vgpr copyStanislav Mekhanoshin2019-10-232-19/+34
| | | | | | | | Potentially sgpr to sgpr copy should also be possible. That is however trickier because we may end up with a wrong register class at use because of xm0/xexec permutations. Differential Revision: https://reviews.llvm.org/D69280
* [AMDGPU] Updated fold-vgpr-copy.mir test. NFC.Stanislav Mekhanoshin2019-10-221-14/+9
|
* [AMDGPU] Allow tied operand subreg foldingStanislav Mekhanoshin2019-10-221-0/+16
| | | | | | Turns out it makes sense, contrarily to what comment said. Differential Revision: https://reviews.llvm.org/D69287
* AMDGPU/GlobalISel: Legalize fast unsafe FDIVAustin Kerbow2019-10-211-0/+798
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69231 llvm-svn: 375460
* AMDGPU: Use CopyToReg for interp intrinsic loweringMatt Arsenault2019-10-211-4/+4
| | | | | | | This doesn't use the default value, so doesn't benefit from the hack to help optimize it. llvm-svn: 375450
* AMDGPU: Erase redundant redefs of m0 in SIFoldOperandsMatt Arsenault2019-10-211-0/+366
| | | | | | | | | | | | | Only handle simple inter-block redefs of m0 to the same value. This avoids interference from redefs of m0 in SILoadStoreOptimzer. I was initially teaching that pass to ignore redefs of m0, but having them not exist beforehand is much simpler. This is in preparation for deleting the current special m0 handling in SIFixSGPRCopies to allow the register coalescer to handle the difficult cases. llvm-svn: 375449
* AMDGPU: Stop adding m0 implicit def to SGPR spillsMatt Arsenault2019-10-212-10/+10
| | | | | | | | r375293 removed the SGPR spilling with scalar stores path, so this is no longer necessary. This also always had the defect of adding the def even when this path wasn't in use. llvm-svn: 375448
* [AMDGPU] Select AGPR in PHI operand legalizationStanislav Mekhanoshin2019-10-211-1/+52
| | | | | | | | | | | | | | If a PHI defines AGPR legalize its operands to AGPR. At the moment we can get an AGPR PHI with VGPR operands. I am not aware of any problems as it seems to be handled gracefully in RA, but this is not right anyway. It also slightly decreases VGPR pressure in some cases because we do not have to a copy via VGPR. Differential Revision: https://reviews.llvm.org/D69206 llvm-svn: 375446
* [IR] Fix mayReadFromMemory() for writeonly callsYevgeny Rouban2019-10-211-1/+1
| | | | | | | | | | | | | | Current implementation of Instruction::mayReadFromMemory() returns !doesNotAccessMemory() which is !ReadNone. This does not take into account that the writeonly attribute also indicates that the call does not read from memory. The patch changes the predicate to !doesNotReadMemory() that reflects the intended behavior. Differential Revision: https://reviews.llvm.org/D69086 llvm-svn: 375389
* AMDGPU: Increase vcc liveness scan thresholdMatt Arsenault2019-10-205-14/+8
| | | | | | | Avoids a test regression in a future patch. Also add debug printing on this case, so I waste less time debugging folds in the future. llvm-svn: 375367
* AMDGPU: Split flat offsets that don't fit in DAGMatt Arsenault2019-10-209-373/+347
| | | | | | | | | | We handle it this way for some other address spaces. Since r349196, SILoadStoreOptimizer has been trying to do this. This is after SIFoldOperands runs, which can change the addressing patterns. It's simpler to just split this earlier. llvm-svn: 375366
* AMDGPU: Add baseline tests for flat offset splittingMatt Arsenault2019-10-202-0/+2916
| | | | llvm-svn: 375364
* AMDGPU: Don't error on calls to null or undefMatt Arsenault2019-10-202-0/+55
| | | | | | Calls to constants should probably be generally handled. llvm-svn: 375356
* AMDGPU: Remove optnone from a testMatt Arsenault2019-10-191-3/+1
| | | | | | | | | It's not clear why the test had this. I'm unable to break the original case with the original patch reverted with or without optnone. This avoids a failure in a future commit. llvm-svn: 375321
* LiveIntervals: Fix handleMoveUp with subreg def moving across a defMatt Arsenault2019-10-181-0/+134
| | | | | | | | | If a subregister def was moved across another subregister def and another use, the main range was not correctly updated. The end point of the moved interval ended too early and missed the use from theh other lanes in the subreg def. llvm-svn: 375300
* [AMDGPU] move PHI nodes to AGPR classStanislav Mekhanoshin2019-10-181-0/+29
| | | | | | | | | If all uses of a PHI are in AGPR register class we should avoid unneeded copies via VGPRs. Differential Revision: https://reviews.llvm.org/D69200 llvm-svn: 375297
* [AMDGPU] Remove -amdgpu-spill-sgpr-to-smem.Jay Foad2019-10-187-291/+9
| | | | | | | | | | | | | | Summary: The implementation was never completed and never used except in tests. Reviewers: arsenm, mareko Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69163 llvm-svn: 375293
* AMDGPU: Relax 32-bit SGPR register classMatt Arsenault2019-10-1898-880/+935
| | | | | | | | | | | Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267
* AMDGPU: Fix SMEM WAR hazard for gfx10 readlaneAustin Kerbow2019-10-181-0/+15
| | | | | | | | | | | | | | | | Summary: Hazard recognizer fails to see hazard with V_READLANE_B32_gfx10. Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69172 llvm-svn: 375265
* GlobalISel: Implement lower for G_SADDO/G_SSUBOMatt Arsenault2019-10-164-137/+278
| | | | | | | Port directly from SelectionDAG, minus the path using ISD::SADDSAT/ISD::SSUBSAT. llvm-svn: 375042
* [AMDGPU] Do not combine dpp mov reading physregsStanislav Mekhanoshin2019-10-161-0/+24
| | | | | | | | We cannot be sure physregs will stay unchanged. Differential Revision: https://reviews.llvm.org/D69065 llvm-svn: 375033
* [AMDGPU] Do not combine dpp with physreg defStanislav Mekhanoshin2019-10-161-0/+12
| | | | | | | | We will remove dpp mov along with the physreg def otherwise. Differential Revision: https://reviews.llvm.org/D69063 llvm-svn: 375030
* [AMDGPU] Fix-up cases where writelane has 2 SGPR operandsDavid Stuttard2019-10-162-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand constraint Due to later register propagation (e.g. fixing up vgpr operands via readfirstlane) changing writelane to only have a single SGPR is tricky. This implementation puts a new check after SIFixSGPRCopies that prevents multiple SGPRs being used in any writelane instructions. The algorithm used is to check for trivial copy prop of suitable constants into one of the SGPR operands and perform that if possible. If this isn't possible put an explicit copy of Src1 SGPR into M0 and use that instead (this is allowable for writelane as the constraint is for SGPR read-port and not constant-bus access). Reviewers: rampitec, tpr, arsenm, nhaehnle Reviewed By: rampitec, arsenm, nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, mgorny, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D51932 Change-Id: Ic7553fa57440f208d4dbc4794fc24345d7e0e9ea llvm-svn: 375004
* [AMDGPU] Extend the SI Load/Store optimizerPiotr Sobczak2019-10-162-0/+1644
| | | | | | | | | | | | | | | | | | | | | Summary: Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle different flavours of image_load and image_sample instructions. When the instructions of the same subclass differ only in dmask, merge them and update dmask accordingly. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64911 llvm-svn: 374984
* AMDGPU: Fix infinite searches in SIFixSGPRCopiesAustin Kerbow2019-10-151-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Two conditions could lead to infinite loops when processing PHI nodes in SIFixSGPRCopies. The first condition involves a REG_SEQUENCE that uses registers defined by both a PHI and a COPY. The second condition arises when a physical register is copied to a virtual register which is then used in a PHI node. If the same virtual register is copied to the same physical register, the result is an endless loop. %0:sgpr_64 = COPY $sgpr0_sgpr1 %2 = PHI %0, %bb.0, %1, %bb.1 $sgpr0_sgpr1 = COPY %0 Reviewers: alex-t, rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68970 llvm-svn: 374944
* [AMDGPU] Support mov dpp with 64 bit operandsStanislav Mekhanoshin2019-10-153-6/+123
| | | | | | | | | | We define mov/update dpp intrinsics as overloaded but do not support i64, which is a practically useful type. Fix the selection and lowering. Differential Revision: https://reviews.llvm.org/D68673 llvm-svn: 374910
* [AMDGPU] Allow DPP combiner to work with REG_SEQUENCEStanislav Mekhanoshin2019-10-151-0/+156
| | | | | | Differential Revision: https://reviews.llvm.org/D68828 llvm-svn: 374908
* [update_mir_test_checks] Handle MI flags properlyRoman Tereshin2019-10-1411-165/+165
| | | | | | | | | | | | | | previously we would generate literal check lines w/ no reg-exps for vregs as MI flags (nsw, ninf, etc.) won't be recognized as a part of MI. Fixing that. Includes updating the MIR tests that suffered from the problem. Reviewed By: bogner Differential Revision: https://reviews.llvm.org/D68905 llvm-svn: 374829
* AMDGPU: Remove unnecessary IR from testMatt Arsenault2019-10-141-63/+9
| | | | llvm-svn: 374800
* [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperatorCameron McInally2019-10-142-53/+53
| | | | | | | | Reapply r374240 with fix for Ocaml test, namely Bindings/OCaml/core.ml. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374782
* [AMDGPU] Come back patch for the 'Assign register class for cross block ↵Alexander Timofeev2019-10-1442-315/+354
| | | | | | | | | | | | | | | | | | | | | | | | values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 llvm-svn: 374767
* [AMDGPU] Use GCN prefix in dpp_combine.mir. NFC.Stanislav Mekhanoshin2019-10-111-82/+82
| | | | llvm-svn: 374607
* [AMDGPU] link dpp pseudos and real instructions on gfx10Stanislav Mekhanoshin2019-10-112-271/+5008
| | | | | | | | | | | | This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 llvm-svn: 374604
* [GISel] Allow getConstantVRegVal() to return G_FCONSTANT values.Marcello Maggioni2019-10-101-22/+14
| | | | | | | | | | | | | | | | | | | | | In GISel we have both G_CONSTANT and G_FCONSTANT, but because in GISel we don't really have a concept of Float vs Int value the only difference between the two is where the data originates from. What both G_CONSTANT and G_FCONSTANT return is just a bag of bits with the constant representation in it. By making getConstantVRegVal() return G_FCONSTANTs bit representation as well we allow ConstantFold and other things to operate with G_FCONSTANT. Adding tests that show ConstantFolding to work on mixed G_CONSTANT and G_FCONSTANT sources. Differential Revision: https://reviews.llvm.org/D68739 llvm-svn: 374458
* [AMDGPU] Handle undef old operand in DPP combineStanislav Mekhanoshin2019-10-101-1/+12
| | | | | | | | It was missing an undef flag. Differential Revision: https://reviews.llvm.org/D68813 llvm-svn: 374455
* [AMDGPU] Fixed dpp_combine.mir with expensive checks. NFC.Stanislav Mekhanoshin2019-10-101-5/+7
| | | | llvm-svn: 374365
* Revert "[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator"Dmitri Gribenko2019-10-102-53/+53
| | | | | | | This reverts commit r374240. It broke OCaml tests: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19014 llvm-svn: 374354
* AMDGPU: Use SGPR_128 instead of SReg_128 for vregsMatt Arsenault2019-10-1044-331/+331
| | | | | | | | | SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284
* AMDGPU: Don't fold copies to physregsMatt Arsenault2019-10-091-1/+1
| | | | | | | | | | | In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257
* AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointerMatt Arsenault2019-10-091-0/+46
| | | | | | | | | | This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255
* GlobalISel: Implement fewerElementsVector for G_BUILD_VECTORMatt Arsenault2019-10-0928-672/+1284
| | | | | | Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252
* [AMDGPU] Fixed dpp combine of VOP1Stanislav Mekhanoshin2019-10-091-0/+23
| | | | | | | | | If original instruction did not have source modifiers they were not added to the new DPP instruction as well, even if needed. Differential Revision: https://reviews.llvm.org/D68729 llvm-svn: 374241
* [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperatorCameron McInally2019-10-092-53/+53
| | | | | | | | Also update Clang to call Builder.CreateFNeg(...) for UnaryMinus. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374240
* AMDGPU: Fix i16 arithmetic pattern redundancyMatt Arsenault2019-10-0810-329/+367
| | | | | | | | | | | | | | There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092
* AMDGPU: Add offsets to MMO when lowering buffer intrinsicsTom Stellard2019-10-081-0/+414
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Without offsets on the MachineMemOperands (MMOs), MachineInstr::mayAlias() will return true for all reads and writes to the same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler when analyzing dependencies of buffer loads and stores. It also limits the SILoadStoreOptimizer from merging more instructions. This patch reduces the compile time of one pathological compute shader from 12 seconds to 1 second. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65097 llvm-svn: 374087
* (Re)generate various tests. NFCAmaury Sechet2019-10-082-128/+997
| | | | llvm-svn: 374074
OpenPOWER on IntegriCloud