summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Force skip over s_sendmsg and exp instructionsNicolai Haehnle2018-07-301-0/+30
| | | | | | | | | | | | | | | | | | | | | Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235
* AMDGPU: Fix code size for return_to_epilog pseudoMatt Arsenault2018-07-271-3/+3
| | | | llvm-svn: 338113
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-22/+24
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-8/+97
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* [AMDGPU] Construct memory clauses before RAStanislav Mekhanoshin2018-05-311-0/+16
| | | | | | | | | | | | | | | | | | Memory clauses are formed into bundles in presence of xnack. Their source operands are marked as early-clobber. This allows to allocate distinct source and destination registers within a clause and prevent breaking the clause with s_nop in the hazard recognizer. Clauses are undone before post-RA scheduler to allow some rescheduling, which will not break the clause since artificial edges are created in the dag to keep memory operations together. Yet this allows a better ILP in some cases. Differential Revision: https://reviews.llvm.org/D47511 llvm-svn: 333691
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* [AMDGPU] Added checks for dpp_ctrl valueStanislav Mekhanoshin2018-05-081-0/+16
| | | | | | | | | | | | - Report error for invalid dpp_ctrl values. - Changed the way it is reported, now the error will be emitted into asm and will work with release build as well. - Added dpp_ctrl value verifier for codegen. - Added symbolic constants for dpp_ctrl. Differential Revision: https://reviews.llvm.org/D46565 llvm-svn: 331775
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-2/+2
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* AMDGPU: Add Vega12 and Vega20Matt Arsenault2018-04-301-14/+26
| | | | | | | | Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331215
* [AMDGPU] Truncate packed inline constantStanislav Mekhanoshin2018-04-241-0/+7
| | | | | | | | | | | | | | | | If a packed inline constant is sign extended it must be truncated after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended to 64 bit. After the value shifted right by 16 to use it in a low part with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline constant any longer. Fixed the error and added verification code. Without the fix and with the verification bug is causing pk_max_f16_literal.ll to fail. Differential Revision: https://reviews.llvm.org/D45987 llvm-svn: 330752
* AMDGPU: Move a flawed assert when spilling SGPRsMatt Arsenault2018-04-231-4/+0
| | | | | | | | It's possible to validly spill the frame offset register in a call sequence to a VGPR. There are definitely issues with SGPR spilling to memory, so move the assert later. llvm-svn: 330612
* AMDGPU: Assign enum name to stack IDMatt Arsenault2018-04-231-2/+2
| | | | | | | | | Also assert that it is correct for SGPRs. There is currently a bug where stack slot coloring replaces SGPR spill FIs with one with the default ID, which results in a more confusing assert later about a dead object. llvm-svn: 330607
* AMDGPU: Legalize the operand of SI_INIT_M0Nicolai Haehnle2018-04-201-0/+15
| | | | | | | | | | | | | | | | | | | | Summary: This fixes a case where the argument to a sendmsg intrinsic ends up in a VGPR, for whatever reason. The underlying performance issue is that a multiplication that can be an s_mul_i32 is instead needlessly generated as v_mul_u32_u24, but this is not addressed by this patch. Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45826 llvm-svn: 330393
* [AMDGPU] Use packed literals with zero either lower or hi partStanislav Mekhanoshin2018-04-191-0/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D45790 llvm-svn: 330365
* Fix layering of MachineValueType.h by moving it from CodeGen to SupportDavid Blaikie2018-03-231-1/+1
| | | | | | | | | This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395
* AMDGPU: Don't leave dead illegal VGPR->SGPR copiesMatt Arsenault2018-03-191-0/+7
| | | | | | | | | Normally DCE kills these, but at -O0 these get left behind leaving suspicious looking illegal copies. Replace with IMPLICIT_DEF to avoid iterator issues. llvm-svn: 327842
* [AMDGPU] added writelane intrinsicTim Renouf2018-02-281-2/+26
| | | | | | | | | | | | | | | | | Summary: For use by LLPC SPV_AMD_shader_ballot extension. The v_writelane instruction was already implemented for use by SGPR spilling, but I had to add an extra dummy operand tied to the destination, to represent that all lanes except the selected one keep the old value of the destination register. .ll test changes were due to schedule changes caused by that new operand. Differential Revision: https://reviews.llvm.org/D42838 llvm-svn: 326353
* AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALUMarek Olsak2018-02-061-2/+8
| | | | | | | | Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 llvm-svn: 324353
* AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9Marek Olsak2018-01-311-22/+31
| | | | | | | | | | | | | | | | Summary: This enables load merging into x2, x4, which is driven by inline offsets. 6500 shaders are affected: Code Size in affected shaders: -15.14 % Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42078 llvm-svn: 323909
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-9/+9
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole.Sam Kolton2017-12-041-0/+22
| | | | | | | | | | | | Summary: Reviewers: arsenm, vpykhtin, rampitec Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D37817 llvm-svn: 319662
* AMDGPU: Use carry-less adds in FI eliminationMatt Arsenault2017-11-301-1/+4
| | | | llvm-svn: 319501
* AMDGPU: Use gfx9 carry-less add/sub instructionsMatt Arsenault2017-11-301-16/+73
| | | | llvm-svn: 319491
* AMDGPU: Consistently check for immediates in SIInstrInfo::FoldImmediateNicolai Haehnle2017-11-281-23/+22
| | | | | | | | | | | | | | | | | | | | | | | Summary: The PeepholeOptimizer pass calls this function solely based on checking DefMI->isMoveImmediate(), which only checks the MoveImm bit of the instruction description. So it's up to FoldImmediate itself to properly check that DefMI *actually* moves from an immediate. I don't have a separate test case for this, but the next patch introduces a test case which happens to crash without this change. This error is caught by the assertion in MachineOperand::getImm(). Change-Id: I88e7cdbcf54d75e1a296822e6fe5f9a5f095bbf8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40342 llvm-svn: 319155
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-2/+2
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* AMDGPU: Replace i64 add/sub loweringMatt Arsenault2017-11-151-0/+73
| | | | | | | | | | | | | | | Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. llvm-svn: 318340
* AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEMMarek Olsak2017-11-091-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: -5.3% code size in affected shaders. Changed stats only: 48486 shaders in 30489 tests Totals: SGPRS: 2086406 -> 2072430 (-0.67 %) VGPRS: 1626872 -> 1627960 (0.07 %) Spilled SGPRs: 7865 -> 7912 (0.60 %) Code Size: 60978060 -> 60188764 (-1.29 %) bytes Max Waves: 374530 -> 374342 (-0.05 %) Totals from affected shaders: SGPRS: 299664 -> 285688 (-4.66 %) VGPRS: 233844 -> 234932 (0.47 %) Spilled SGPRs: 3959 -> 4006 (1.19 %) Code Size: 14905272 -> 14115976 (-5.30 %) bytes Max Waves: 46202 -> 46014 (-0.41 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38915 llvm-svn: 317750
* AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offsetMarek Olsak2017-10-311-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Apps that benefit: - alien isolation - bioshock infinite - civilization: beyond earth - company of heroes 2 - dirt showdown - dota 2 - F1 2015 - grid autosport - hitman - legend of grimrock - serious sam 3: bfe - shadow warrior - talos principle - total war: warhammer - UE4 demos: effects cave, elemental, sun temple Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38914 llvm-svn: 317038
* AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)Marek Olsak2017-10-241-0/+21
| | | | | | | | | | | | | | | | Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427
* AMDGPU: Fix not accounting for instruction size in bundlesMatt Arsenault2017-10-041-1/+14
| | | | | | | These were counted as 0. Fixes branch limit exceeded errors in some large programs. llvm-svn: 314944
* AMDGPU: VALU carry-in and v_cndmask condition cannot be EXECNicolai Haehnle2017-09-291-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* llvm-svn: 314522
* AMDGPU: Fix crash on immediate operandMatt Arsenault2017-09-211-1/+5
| | | | | | | | We can have a v_mac with an immediate src0. We can still fold if it's an inline immediate, otherwise it already uses the constant bus. llvm-svn: 313852
* AMDGPU: Start selecting s_xnor_{b32, b64}Konstantin Zhuravlyov2017-09-181-0/+37
| | | | | | Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565
* Fix warnings in r313297.Jan Sjodin2017-09-141-3/+1
| | | | llvm-svn: 313302
* AMDGPU: Fix violating constant bus restrictionMatt Arsenault2017-09-141-4/+5
| | | | | | You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298
* Add AddresSpace to PseudoSourceValue.Jan Sjodin2017-09-141-0/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297
* AMDGPU: Don't spill SP reg like a normal CSRMatt Arsenault2017-09-131-0/+4
| | | | llvm-svn: 313217
* Allow target to decide when to cluster loads/stores in mischedStanislav Mekhanoshin2017-09-131-0/+38
| | | | | | | | | | | | | | | | MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208
* [AMDGPU] Produce madak and madmk from the two-address passStanislav Mekhanoshin2017-09-111-0/+42
| | | | | | | | | | These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928
* [AMDGPU] Fix shouldClusterMemOps to process flat loadsStanislav Mekhanoshin2017-09-061-0/+4
| | | | | | | | Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640
* [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-081-25/+47
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 310328
* [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsicConnor Abbott2017-08-041-0/+22
| | | | | | | | | | | | | | | | | | | Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34719 llvm-svn: 310088
* [AMDGPU] Add support for Whole Wavefront ModeConnor Abbott2017-08-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087
* [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQMConnor Abbott2017-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085
* AMDGPU: Pass special input registers to functionsMatt Arsenault2017-08-031-5/+4
| | | | llvm-svn: 309998
* AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flatMatt Arsenault2017-07-291-1/+1
| | | | | | | Checking the encoding is insufficient since now there can be global or scratch instructions. llvm-svn: 309472
* AMDGPU: Fix getMemOpBaseRegImmOfs for flat with offsetsMatt Arsenault2017-07-211-3/+13
| | | | llvm-svn: 308762
* Add an ID field to StackObjectsMatt Arsenault2017-07-201-0/+2
| | | | | | | | | | | | | | | | | | | | | On AMDGPU SGPR spills are really spilled to another register. The spiller creates the spills to new frame index objects, which is used as a placeholder. This will eventually be replaced with a reference to a position in a VGPR to write to and the frame index deleted. It is most likely not a real stack location that can be shared with another stack object. This is a problem when StackSlotColoring decides it should combine a frame index used for a normal VGPR spill with a real stack location and a frame index used for an SGPR. Add an ID field so that StackSlotColoring has a way of knowing the different frame index types are incompatible. llvm-svn: 308673
* [AMDGPU] Do not insert an instruction into worklist twice in movetovaluAlfred Huang2017-07-141-12/+12
| | | | | | | | | | In moveToVALU(), move to vector ALU is performed, all instrs in the use chain will be visited. We do not want the same node to be pushed to the visit worklist more than once. Differential Revision: https://reviews.llvm.org/D34726 llvm-svn: 308039
* [AMDGPU] Fix -Wimplicit-fallthrough warnings. NFCI.Simon Pilgrim2017-07-071-0/+2
| | | | llvm-svn: 307381
OpenPOWER on IntegriCloud