summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Run AMDGPUCodeGenPrepare after scalar optsMatt Arsenault2019-08-272-184/+170
| | | | | | | | | | | | | | | | | | | | The mul24 matching could interfere with SLSR and the other addressing mode related passes. This probably is not the optimal placement, but is an intermediate step. This should probably be moved after all the generic IR passes, particularly LSR. Moving this after LSR seems to help in some cases, and hurts others. As-is in this patch, in idiv-licm, it saves 1-2 instructions inside some of the loop bodies, but increases the number in others. Moving this later helps these loops. In the new lsr tests in mul24-pass-ordering, the intrinsic prevents introducing more instructions in the loop preheader, so moving this later ends up hurting them. This shouldn't be any worse than before the intrinsics were introduced in r366094, and LSR should probably be smarter. I think it's because it doesn't know the and inside the loop will be folded away. llvm-svn: 369991
* AMDGPU: Add baseline test for mul24 ordering issuesMatt Arsenault2019-08-241-0/+263
| | | | llvm-svn: 369858
* AMDGPU: Introduce a flag to disable mul24 intrinsic formationMatt Arsenault2019-08-241-54/+165
| | | | llvm-svn: 369856
* AMDGPU: Generate check linesMatt Arsenault2019-08-241-106/+345
| | | | | | | | Checking all the instructions will help catch LICM changes when passes are reordered. Also switch to using gfx9 since global stores make the relevant instructions more obvious. llvm-svn: 369855
* [AMDGPU] w/a for gfx908 mfma SrcC literal HW bugStanislav Mekhanoshin2019-08-232-6/+18
| | | | | | | | gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369816
* [GlobalISel] Legalizer: Retry combining illegal artifacts as long as there ↵Volkan Keles2019-08-2325-536/+946
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | new artifacts Summary: Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s possible to combine them after processing the rest of the instruction because the legalization is likely to generate more artifacts that allow ArtifactCombiner to combine away them. Instead, move illegal artifacts to another list called RetryList and wait until all of the instruction in InstList are legalized. After that, check if there is any new artifacts and try to combine them again if that’s the case. If not, abort. The idea is similar to D59339, but the approach is a bit different. This patch fixes the issue described above, but the legalizer still may be unable to handle some cases depending on when to legalize artifacts. So, in the long run, we probably need a different legalization strategy that handles this dependency in a better way. Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette Reviewed By: dsanders Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65894 llvm-svn: 369805
* [AMDGPU] Automatically generate various tests. NFCAmaury Sechet2019-08-236-748/+2521
| | | | llvm-svn: 369787
* [AMDGPU] gfx10 atomic optimizer changes.Jay Foad2019-08-236-233/+469
| | | | | | | | | | | | | | | | Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65644 llvm-svn: 369745
* GlobalISel: Don't create G_UADDE with constant false carry inMatt Arsenault2019-08-223-78/+66
| | | | | | | | The x86 tests are now broken (in paticular add-scalar.ll now hits the DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a custom lowering for this, so that will need to be implemented. llvm-svn: 369673
* [MBP] Disable aggressive loop rotate in plain modeGuozhi Wei2019-08-2211-118/+111
| | | | | | | | | | Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 369664
* GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sourcesMatt Arsenault2019-08-2124-349/+714
| | | | | | | | This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547
* [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitionsAlexander Timofeev2019-08-211-1/+1
| | | | | | | Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532
* [test] Fix tests when run on windows after SVN r369426. NFC.Martin Storsjo2019-08-203-3/+3
| | | | | | | | | | | | | | When running tests on windows, invoking "llc -march=<arch>" will implicitly use windows as the target os, making these tests misbehave after this change. Fix the issue by using more specific -mtriple values instead of plain -march in these tests. This should hopefully fix buildbot failures like http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/9816. llvm-svn: 369443
* Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"Matt Arsenault2019-08-203-165/+56
| | | | | | | This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417
* AMDGPU: Fix iterator error when lowering SI_END_CFMatt Arsenault2019-08-181-1/+68
| | | | | | | If the instruction is the last in the block, there is no next instruction but the iteration still needs to look at the new block. llvm-svn: 369203
* AMDGPU: Reduce number of registers in testMatt Arsenault2019-08-141-5/+1
| | | | | | | Once the failure this is testing is fixed, this would fail due to using too many registers. llvm-svn: 368901
* [GlobalISel]: Fix lowering of G_Shuffle_vector where we pick up the wrong ↵Aditya Nandakumar2019-08-141-7/+34
| | | | | | | | source index https://reviews.llvm.org/D66182 llvm-svn: 368781
* [GlobalISel]: Fix lowering of G_SHUFFLE_VECTOR with scalar sourcesAditya Nandakumar2019-08-131-0/+20
| | | | | | https://reviews.llvm.org/D66171 llvm-svn: 368753
* [AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm'Tim Renouf2019-08-131-2/+2
| | | | | | | | | That change (r363670) could leave a copy from vgpr to sgpr. Fixed. Differential Revision: https://reviews.llvm.org/D66133 Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4 llvm-svn: 368736
* GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUESMatt Arsenault2019-08-1322-371/+1001
| | | | | | Odd sized vectors aren't handled yet. llvm-svn: 368713
* GlobalISel: Implement lower for G_SHUFFLE_VECTORMatt Arsenault2019-08-131-0/+257
| | | | llvm-svn: 368709
* [AMDGPU] Printf runtime binding passStanislav Mekhanoshin2019-08-121-0/+34
| | | | | | | | | | | This pass is a port of the according pass from the HSAIL compiler. It parses printf calls and setup runtime printf buffer. After that it copies printf arguments to the buffer and fills in module metadata for runtime. Differential Revision: https://reviews.llvm.org/D24035 llvm-svn: 368592
* Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"Hans Wennborg2019-08-1211-111/+118
| | | | | | | | | | | | | | | | | | It caused assertions to fire when building Chromium: lib/CodeGen/LiveDebugValues.cpp:331: bool {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed. See https://crbug.com/992871#c3 for how to reproduce. > Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. > > To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. > > Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368579
* [globalisel] Add G_SEXT_INREGDaniel Sanders2019-08-0910-62/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487
* [MBP] Disable aggressive loop rotate in plain modeGuozhi Wei2019-08-0811-118/+111
| | | | | | | | | | Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368339
* [StructurizeCFG] Enable -structurizecfg-relaxed-uniform-regions by defaultTim Renouf2019-08-061-2/+2
| | | | | | | | | | | D62198 introduced an option to relax the checks for hasOnlyUniformBranches. This commit turns the option on by default, for better code generation in some cases in AMDGPU. Differential Revision: https://reviews.llvm.org/D63198 Change-Id: I9cbff002a1e74d3b7eb96b4192dc8129936d537d llvm-svn: 368042
* Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-061-16/+31
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367969
* Revert "[AMDGPU] Use S_DENORM_MODE for gfx10"Dmitri Gribenko2019-08-051-30/+15
| | | | | | | This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. llvm-svn: 367904
* [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-051-15/+30
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367882
* AMDGPU/LoadStoreOptimizer: Set the correct offset whem merging MMOsTom Stellard2019-08-051-1/+9
| | | | | | | | | | | | | | | | | | Summary: This is a follow up to r367237. MachineFunction::getMachineMemOperand() adds the offset parameter to the existing offset instead of resetting it. So we need to reset the offset to the correct value after calling this function. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65557 llvm-svn: 367881
* AMDGPU: Correct behavior of f16 buffer loadsMatt Arsenault2019-08-052-2/+146
| | | | | | | Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879
* AMDGPU: Correct behavior of f16/i16 non-format store intrinsicsMatt Arsenault2019-08-052-2/+117
| | | | | | | | | This was switching to use a format store for a non-format store for f16 types. Also fixes i16/f16 stores on targets without legal f16. The corresponding loads also need to be fixed. llvm-svn: 367872
* AMDGPU/GlobalISel: Alternative mappings for constantsMatt Arsenault2019-08-051-0/+34
| | | | | | | | Without context we assume SGPR. Allowing VGPR constants theoretically helps avoid a copy. This seems to not actually work now, and the choice isn't based on the use bank. llvm-svn: 367871
* AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}Nicolai Haehnle2019-08-052-2/+18
| | | | | | | | | | | | | | | | | Summary: Wrapping increment/decrement. These aren't exposed by many APIs... Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c Reviewers: arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65283 llvm-svn: 367821
* IR: print value numbers for unnamed function argumentsTim Northover2019-08-032-30/+30
| | | | | | | | | | For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definitions. Also modifies the parser to accept IR in that form for obvious reasons. llvm-svn: 367755
* [AMDGPU] Regenerated saddo.ll test file for D47927Simon Pilgrim2019-08-021-20/+557
| | | | llvm-svn: 367698
* GlobalISel: Lower scalarizing unmerge of a vector to shiftsMatt Arsenault2019-08-0135-426/+917
| | | | | | | | | | | | | | AMDGPU sometimes has legal s16 and <2 x s16> operations, but all registers are really 32-bit. An unmerge destination really should ben widened to a 32-bit register. If widening a scalarizing vector with a target size that matches the vector size, bitcast to integer and extract the relevant bits with shifts. I'm not sure if this is the right place for this. This could arguably be part of widenScalar for the result. I also have a growing feeling that we're missing a bitcast legalize action. llvm-svn: 367604
* AMDGPU: Remove v0 workaround for DS_GWS_* instructionsMatt Arsenault2019-08-012-21/+21
| | | | | | | Any register should work for the src field since r366067, since the used value is not pulled from the expected encoding field. llvm-svn: 367598
* GlobalISel: Fix widenScalar for G_MERGE_VALUES to pointerMatt Arsenault2019-08-011-0/+16
| | | | | | | AMDGPU testcase isn't broken now, but will be in a future patch without this. llvm-svn: 367591
* [Testing] Fix tests that break with read-only checkoutsDavid Zarzycki2019-08-011-1/+1
| | | | | | Found with `mount --bind -o ro ...` on Linux. llvm-svn: 367519
* AMDGPU/GlobalISel: fix inst-select-load-local.mir in ↵Fangrui Song2019-08-011-4/+2
| | | | | | -DLLVM_ENABLE_ASSERTIONS=off builds after r367498 llvm-svn: 367514
* AMDGPU/GlobalISel: Fix flat load/store of pointer typesMatt Arsenault2019-08-014-96/+104
| | | | llvm-svn: 367513
* AMDGPU/GlobalISel: Remove manual store select codeMatt Arsenault2019-08-018-389/+372
| | | | | | | This regresses the weird types that are newly treated as legal load types, but fixes incorrectly using flat instrucions on SI. llvm-svn: 367512
* AMDGPU/GlobalISel: Select local atomic cmpxchgMatt Arsenault2019-08-011-0/+91
| | | | llvm-svn: 367511
* AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADDMatt Arsenault2019-08-013-0/+153
| | | | llvm-svn: 367509
* AMDGPU/GlobalISel: Allow selection of DS atomicrmwMatt Arsenault2019-08-011-0/+83
| | | | llvm-svn: 367507
* AMDGPU/GlobalISel: Select simple local storesMatt Arsenault2019-08-011-0/+262
| | | | llvm-svn: 367504
* GlobalISel: moreElementsVector for G_LOAD/G_STOREMatt Arsenault2019-08-012-6/+61
| | | | | | | AMDGPU change and test is a placeholder until a future patch with complete handling. llvm-svn: 367503
* Reapply "AMDGPU: Split block for si_end_cf"Matt Arsenault2019-08-012-55/+97
| | | | | | This reverts commit r359363, reapplying r357634 llvm-svn: 367500
* AMDGPU/GlobalISel: Select local loadsMatt Arsenault2019-08-011-0/+906
| | | | llvm-svn: 367498
OpenPOWER on IntegriCloud