summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Introduce a flag to disable mul24 intrinsic formationMatt Arsenault2019-08-241-1/+7
| | | | llvm-svn: 369856
* Fix some accidental global initializers by using StringLiteral instead of ↵Benjamin Kramer2019-08-241-3/+3
| | | | | | StringRef llvm-svn: 369850
* [AMDGPU] Check for immediate SrcC in mfma in AsmParserStanislav Mekhanoshin2019-08-231-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D66674 llvm-svn: 369819
* [AMDGPU] w/a for gfx908 mfma SrcC literal HW bugStanislav Mekhanoshin2019-08-231-3/+9
| | | | | | | | gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369818
* [AMDGPU] w/a for gfx908 mfma SrcC literal HW bugStanislav Mekhanoshin2019-08-236-5/+30
| | | | | | | | gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369816
* [AMDGPU] gfx10 atomic optimizer changes.Jay Foad2019-08-233-57/+152
| | | | | | | | | | | | | | | | Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65644 llvm-svn: 369745
* [MC] Minor cleanup to MCFixup::Kind handling. NFC.Sam Clegg2019-08-231-1/+1
| | | | | | | | | | Prefer `MCFixupKind` where possible and add getTargetKind() to convert to `unsigned` when needed rather than scattering cast operators around the place. Differential Revision: https://reviews.llvm.org/D59890 llvm-svn: 369720
* [MVT] Add v16f16 and v32f16 vectors.Craig Topper2019-08-211-0/+4
| | | | | | | | | I might look at improving PR43065 which will require being able to mark a 256 and 512 bit vector of f16 as Legal. Differential Revision: https://reviews.llvm.org/D66515 llvm-svn: 369565
* GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sourcesMatt Arsenault2019-08-211-1/+1
| | | | | | | | This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547
* [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitionsAlexander Timofeev2019-08-212-0/+16
| | | | | | | Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532
* Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"Matt Arsenault2019-08-205-142/+28
| | | | | | | This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417
* [AsmPrinter] Remove const qualifier from EmitBasicBlockStart.Karl-Johan Karlsson2019-08-202-4/+4
| | | | | | | | | | | | | | | | Overriders may want to modify state in it. AMDGPU wants to, but has to make its members mutable in order to do so. Besides, EmitBasicBlockEnd is not const, so why should Start be? Patch by Bevin Hansson. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D66341 llvm-svn: 369325
* AMDGPU: Fix iterator error when lowering SI_END_CFMatt Arsenault2019-08-181-4/+4
| | | | | | | If the instruction is the last in the block, there is no next instruction but the iteration still needs to look at the new block. llvm-svn: 369203
* AMDGPU: Disambiguate v3f16 format in load/store tablesMatt Arsenault2019-08-185-104/+119
| | | | | | | | | Currently the searchable tables report the number of dwords. These round to the same number for 3 and 4 component d16 instructions. Change this to report the number of elements so this isn't ambiguous. llvm-svn: 369202
* Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders2019-08-1534-339/+338
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
* MVT: Add v3i16/v3f16 vectorsMatt Arsenault2019-08-152-0/+7
| | | | | | | | | | | | AMDGPU has some buffer intrinsics which theoretically could use this. Some of the generated tables include the 3 and 4 element vector versions of these rounded to 64-bits, which is ambiguous. Add these to help the table disambiguate these. Assertion change is for the path odd sized vectors now take for R600. v3i16 is widened to v4i16, which then needs to be promoted to v4i32. llvm-svn: 369038
* [llvm] Migrate llvm::make_unique to std::make_uniqueJonas Devlieghere2019-08-1513-35/+35
| | | | | | | | Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013
* InferAddressSpaces: Move target intrinsic handling to TTIMatt Arsenault2019-08-142-0/+45
| | | | | | | | I'm planning on handling intrinsics that will benefit from checking the address space enums. Don't bother moving the address collection for now, since those won't need th enums. llvm-svn: 368895
* [AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm'Tim Renouf2019-08-131-0/+3
| | | | | | | | | That change (r363670) could leave a copy from vgpr to sgpr. Fixed. Differential Revision: https://reviews.llvm.org/D66133 Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4 llvm-svn: 368736
* GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUESMatt Arsenault2019-08-131-1/+10
| | | | | | Odd sized vectors aren't handled yet. llvm-svn: 368713
* GlobalISel: Implement lower for G_SHUFFLE_VECTORMatt Arsenault2019-08-131-0/+3
| | | | llvm-svn: 368709
* [GlobalISel] Make the InstructionSelector instance non-const, allowing state ↵Amara Emerson2019-08-133-27/+22
| | | | | | | | | | | | | | | | to be maintained. Currently we can't keep any state in the selector object that we get from subtarget. As a result we have to plumb through all our variables through multiple functions. This change makes it non-const and adds a virtual init() method to allow further state to be captured for each target. AArch64 makes use of this in this patch to cache a call to hasFnAttribute() which is expensive to call, and is used on each selection of G_BRCOND. Differential Revision: https://reviews.llvm.org/D65984 llvm-svn: 368652
* [AMDGPU] Fix msan failure in printf loweringStanislav Mekhanoshin2019-08-131-5/+3
| | | | llvm-svn: 368645
* [AMDGPU] removed unused functions from printf loweringStanislav Mekhanoshin2019-08-121-21/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D66117 llvm-svn: 368633
* [AMDGPU] Use PredicateControl in MIMGBaseOpcode. NFC.Stanislav Mekhanoshin2019-08-121-2/+2
| | | | | | | | | | | This is infrastructural, will be needed for future work. For some reason it was only used in MIMG_NoSampler, while needed everywere we use MIMGBaseOpcode if we want to use predicates. Differential Revision: https://reviews.llvm.org/D66115 llvm-svn: 368626
* [AMDGPU] Printf runtime binding passStanislav Mekhanoshin2019-08-124-0/+623
| | | | | | | | | | | This pass is a port of the according pass from the HSAIL compiler. It parses printf calls and setup runtime printf buffer. After that it copies printf arguments to the buffer and fills in module metadata for runtime. Differential Revision: https://reviews.llvm.org/D24035 llvm-svn: 368592
* [globalisel] Add G_SEXT_INREGDaniel Sanders2019-08-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487
* Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-067-16/+67
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367969
* [GlobalISel][CallLowering] Rename isArgumentHandler() -> ↵Amara Emerson2019-08-051-1/+1
| | | | | | | | | isIncomingArgumentHandler() Previous name and comment incorrectly implied it was just for formal arg handlers, which is not true. llvm-svn: 367945
* Revert "[AMDGPU] Use S_DENORM_MODE for gfx10"Dmitri Gribenko2019-08-057-67/+16
| | | | | | | This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. llvm-svn: 367904
* [AMDGPU] Use S_DENORM_MODE for gfx10Austin Kerbow2019-08-057-16/+67
| | | | | | | | | | | | | | | | Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367882
* AMDGPU/LoadStoreOptimizer: Set the correct offset whem merging MMOsTom Stellard2019-08-051-1/+6
| | | | | | | | | | | | | | | | | | Summary: This is a follow up to r367237. MachineFunction::getMachineMemOperand() adds the offset parameter to the existing offset instead of resetting it. So we need to reset the offset to the correct value after calling this function. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65557 llvm-svn: 367881
* AMDGPU: Correct behavior of f16 buffer loadsMatt Arsenault2019-08-053-43/+63
| | | | | | | Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879
* AMDGPU: Correct behavior of f16/i16 non-format store intrinsicsMatt Arsenault2019-08-052-30/+63
| | | | | | | | | This was switching to use a format store for a non-format store for f16 types. Also fixes i16/f16 stores on targets without legal f16. The corresponding loads also need to be fixed. llvm-svn: 367872
* AMDGPU/GlobalISel: Alternative mappings for constantsMatt Arsenault2019-08-051-1/+13
| | | | | | | | Without context we assume SGPR. Allowing VGPR constants theoretically helps avoid a copy. This seems to not actually work now, and the choice isn't based on the use bank. llvm-svn: 367871
* AMDGPU/GlobalISel: Don't reject shader typesMatt Arsenault2019-08-051-4/+0
| | | | | | | | | | | | I'm not sure what complications these present, but the current argument lowering is pretty much directly copied from the DAG lowering, so I assume these work as they should. No tests because I'm lazy and things are getting pretty close to the point where the existing calling-conventions.ll can be shared with SelectionDAG. llvm-svn: 367870
* [LLVM][Alignment] Introduce Alignment TypeGuillaume Chatelet2019-08-051-6/+6
| | | | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828
* AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}Nicolai Haehnle2019-08-056-2/+32
| | | | | | | | | | | | | | | | | Summary: Wrapping increment/decrement. These aren't exposed by many APIs... Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c Reviewers: arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65283 llvm-svn: 367821
* Use MCRegister in MCRegisterInfo's interfacesDaniel Sanders2019-08-021-3/+3
| | | | | | | | | | | | | | | | | Summary: As part of this, define DenseMapInfo for MCRegister (and Register while I'm at it) Depends on D65599 Reviewers: arsenm Subscribers: MatzeB, qcolombet, jvesely, wdng, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65605 llvm-svn: 367719
* Fix up an unused variable warning caused by TRI->isVirtualRegister() -> ↵Daniel Sanders2019-08-021-0/+2
| | | | | | Register::isVirtualRegister() llvm-svn: 367637
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-0129-171/+149
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* GlobalISel: Lower scalarizing unmerge of a vector to shiftsMatt Arsenault2019-08-011-0/+1
| | | | | | | | | | | | | | AMDGPU sometimes has legal s16 and <2 x s16> operations, but all registers are really 32-bit. An unmerge destination really should ben widened to a 32-bit register. If widening a scalarizing vector with a target size that matches the vector size, bitcast to integer and extract the relevant bits with shifts. I'm not sure if this is the right place for this. This could arguably be part of widenScalar for the result. I also have a growing feeling that we're missing a bitcast legalize action. llvm-svn: 367604
* AMDGPU: Remove v0 workaround for DS_GWS_* instructionsMatt Arsenault2019-08-012-34/+4
| | | | | | | Any register should work for the src field since r366067, since the used value is not pulled from the expected encoding field. llvm-svn: 367598
* AMDGPU: Use tablegen pattern for sendmsg intrinsicsMatt Arsenault2019-08-013-17/+23
| | | | | | | Since this now emits a direct copy to m0, SIFixSGPRCopies has to handle a physical register. llvm-svn: 367593
* AMDGPU/SILoadStoreOptimizer: Make some functions constTom Stellard2019-08-011-6/+6
| | | | | | | | | | | | | | Reviewers: arsenm, pendingchaos, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65316 llvm-svn: 367517
* AMDGPU/GlobalISel: Fix flat load/store of pointer typesMatt Arsenault2019-08-014-8/+13
| | | | llvm-svn: 367513
* AMDGPU/GlobalISel: Remove manual store select codeMatt Arsenault2019-08-012-58/+23
| | | | | | | This regresses the weird types that are newly treated as legal load types, but fixes incorrectly using flat instrucions on SI. llvm-svn: 367512
* AMDGPU/GlobalISel: Select local atomic cmpxchgMatt Arsenault2019-08-013-28/+13
| | | | llvm-svn: 367511
* AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADDMatt Arsenault2019-08-014-0/+6
| | | | llvm-svn: 367509
* AMDGPU/GlobalISel: Allow selection of DS atomicrmwMatt Arsenault2019-08-014-6/+27
| | | | llvm-svn: 367507
OpenPOWER on IntegriCloud