summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstrInfo.h
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructionsScott Linder2018-10-081-11/+21
| | | | | | | | | | | Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Recommits r341413 with changes to update the MachineDominatorTree when present. Differential Revision: https://reviews.llvm.org/D51742 llvm-svn: 343992
* [AMDGPU] Assert in getOpSize() there are no sub-dword subregsStanislav Mekhanoshin2018-10-031-1/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D52769 llvm-svn: 343648
* [AMDGPU] Fixed SIInstrInfo::getOpSize to handle subregsStanislav Mekhanoshin2018-10-011-0/+5
| | | | | | | | | | Currently it returns incorrect operand size for a target independet node such as COPY if operand is a register with subreg. Instead of correct subreg size it returns a size of the whole superreg. Differential Revision: https://reviews.llvm.org/D52736 llvm-svn: 343508
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-111-0/+3
| | | | | | | | | Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928
* Revert r341413Scott Linder2018-09-061-6/+0
| | | | | | Causes a regression in expensive checks. llvm-svn: 341589
* [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructionsScott Linder2018-09-041-0/+6
| | | | | | | | | Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Differential Revision: https://reviews.llvm.org/D50982 llvm-svn: 341413
* AMDGPU: Shrink insts to fold immediatesMatt Arsenault2018-08-281-0/+3
| | | | | | | | | This needs to be done in the SSA fold operands pass to be effective, so there is a bit of overlap with SIShrinkInstructions but I don't think this is practically avoidable. llvm-svn: 340859
* AMDGPU: Move canShrink into TIIMatt Arsenault2018-08-281-0/+3
| | | | llvm-svn: 340855
* [AMDGPU] Add support for multi-dword s.buffer.load intrinsicTim Renouf2018-08-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch by Marek Olsak and David Stuttard, both of AMD. This adds a new amdgcn intrinsic supporting s.buffer.load, in particular multiple dword variants. These are convenient to use from some front-end implementations. Also modified the existing llvm.SI.load.const intrinsic to common up the underlying implementation. This modification also requires that we can lower to non-uniform loads correctly by splitting larger dword variants into sizes supported by the non-uniform versions of the load. V2: Addressed minor review comments. V3: i1 glc is now i32 cachepolicy for consistency with buffer and tbuffer intrinsics, plus fixed formatting issue. V4: Added glc test. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51098 Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b llvm-svn: 340684
* [PSV] Update API to be able to use TargetCustom without UB.Marcello Maggioni2018-08-201-1/+1
| | | | | | | | | | | getTargetCustom() requires values for "Kind" in the constructor that are not in the PSVKind enum. Passing a value that is not inside an enum as an argument to a constructor of the type of the enum is UB. Changing to the underlying type of the enum would solve the UB Differential Revision: https://reviews.llvm.org/D50909 llvm-svn: 340200
* AMDGPU: Force skip over s_sendmsg and exp instructionsNicolai Haehnle2018-07-301-0/+3
| | | | | | | | | | | | | | | | | | | | | Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-3/+3
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-2/+14
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* AMDGPU: Turn D16 for MIMG instructions into a regular operandNicolai Haehnle2018-06-211-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-20/+20
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [AMDGPU][MC] Added lds support for MUBUF instructionsDmitry Preobrazhensky2018-02-211-0/+3
| | | | | | | | | See bug 28234: https://bugs.llvm.org/show_bug.cgi?id=28234 Differential Revision: https://reviews.llvm.org/D43472 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 325676
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-01-241-1/+1
| | | | | | "the the" -> "the" llvm-svn: 323302
* AMDGPU/SI: Add d16 support for image intrinsics.Changpeng Fang2018-01-181-0/+8
| | | | | | | | | | | | | Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 llvm-svn: 322903
* AMDGPU: Use gfx9 carry-less add/sub instructionsMatt Arsenault2017-11-301-3/+4
| | | | llvm-svn: 319491
* AMDGPU: Replace list of SMEM buffer opcodesMatt Arsenault2017-11-171-0/+13
| | | | llvm-svn: 318506
* AMDGPU: Replace i64 add/sub loweringMatt Arsenault2017-11-151-1/+5
| | | | | | | | | | | | | | | Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. llvm-svn: 318340
* AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEMMarek Olsak2017-11-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: -5.3% code size in affected shaders. Changed stats only: 48486 shaders in 30489 tests Totals: SGPRS: 2086406 -> 2072430 (-0.67 %) VGPRS: 1626872 -> 1627960 (0.07 %) Spilled SGPRs: 7865 -> 7912 (0.60 %) Code Size: 60978060 -> 60188764 (-1.29 %) bytes Max Waves: 374530 -> 374342 (-0.05 %) Totals from affected shaders: SGPRS: 299664 -> 285688 (-4.66 %) VGPRS: 233844 -> 234932 (0.47 %) Spilled SGPRs: 3959 -> 4006 (1.19 %) Code Size: 14905272 -> 14115976 (-5.30 %) bytes Max Waves: 46202 -> 46014 (-0.41 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38915 llvm-svn: 317750
* AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)Marek Olsak2017-10-241-0/+3
| | | | | | | | | | | | | | | | Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427
* AMDGPU: Fix not accounting for instruction size in bundlesMatt Arsenault2017-10-041-0/+1
| | | | | | | These were counted as 0. Fixes branch limit exceeded errors in some large programs. llvm-svn: 314944
* AMDGPU: Start selecting s_xnor_{b32, b64}Konstantin Zhuravlyov2017-09-181-0/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565
* Fix warnings in r313297.Jan Sjodin2017-09-141-2/+2
| | | | llvm-svn: 313302
* Add AddresSpace to PseudoSourceValue.Jan Sjodin2017-09-141-0/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297
* Allow target to decide when to cluster loads/stores in mischedStanislav Mekhanoshin2017-09-131-1/+2
| | | | | | | | | | | | | | | | MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208
* AMDGPU: Fold clamp modifier for packed instructionsMatt Arsenault2017-08-311-2/+14
| | | | llvm-svn: 312297
* [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-081-15/+30
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 310328
* AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flatMatt Arsenault2017-07-291-0/+8
| | | | | | | Checking the encoding is insufficient since now there can be global or scratch instructions. llvm-svn: 309472
* AMDGPU: Don't track lgkmcnt for global_/scratch_ instructionsMatt Arsenault2017-07-211-0/+4
| | | | llvm-svn: 308766
* [AMDGPU] Do not insert an instruction into worklist twice in movetovaluAlfred Huang2017-07-141-8/+11
| | | | | | | | | | In moveToVALU(), move to vector ALU is performed, all instrs in the use chain will be visited. We do not want the same node to be pushed to the visit worklist more than once. Differential Revision: https://reviews.llvm.org/D34726 llvm-svn: 308039
* AMDGPU: Add operand target flags serializationMatt Arsenault2017-07-021-0/+8
| | | | llvm-svn: 306995
* [AMDGPU] SDWA: merge VI and GFX9 pseudo instructionsSam Kolton2017-06-211-0/+3
| | | | | | | | | | | | Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9. Reviewers: dp, arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov Differential Revision: https://reviews.llvm.org/D34026 llvm-svn: 305886
* Re-submit AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-1/+32
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111
* Revert 303091.Jan Sjodin2017-05-151-32/+1
| | | | llvm-svn: 303098
* Add AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-1/+32
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-2/+2
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* [AMDGPU] added SIInstrInfo::getAddNoCarry() helperStanislav Mekhanoshin2017-04-141-0/+9
| | | | | | | | Addressed rest of post submit comments from D31993. Differential Revision: https://reviews.llvm.org/D32057 llvm-svn: 300288
* [AMDGPU] SDWA Peephole: improve search for immediates in SDWA patternsSam Kolton2017-03-311-0/+2
| | | | | | | | | | | | | | | | | Previously compiler often extracted common immediates into specific register, e.g.: ``` %vreg0 = S_MOV_B32 0xff; %vreg2 = V_AND_B32_e32 %vreg0, %vreg1 %vreg4 = V_AND_B32_e32 %vreg0, %vreg3 ``` Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands: ``` SDWA src: %vreg2 src_sel:BYTE_0 SDWA src: %vreg4 src_sel:BYTE_0 ``` With this change peephole check if operand is either immediate or register that is copy of immediate. llvm-svn: 299202
* [ADMGPU] SDWA peephole optimization pass.Sam Kolton2017-03-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365
* AMDGPU: Fix broken condition in hazard recognizerMatt Arsenault2017-03-171-0/+8
| | | | | | Fixes bug 32248. llvm-svn: 298125
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-22/+3
| | | | llvm-svn: 296396
* AMDGPU: Don't fold immediate if clamp/omod are setMatt Arsenault2017-02-271-0/+1
| | | | | | | Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. llvm-svn: 296375
* AMDGPU: Add VOP3P instruction formatMatt Arsenault2017-02-271-0/+8
| | | | | | | | Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368
* AMDGPU: Fold FP clamp as modifier bitMatt Arsenault2017-02-221-0/+8
| | | | | | | | | | | The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905
* AMDGPU: Implement early ifcvt target hooks.Matt Arsenault2017-01-251-0/+12
| | | | | | | | | | | | Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
* [AMDGPU] Add subtarget features for SDWA/DPPSam Kolton2017-01-201-0/+8
| | | | | | | | | | Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28900 llvm-svn: 292596
OpenPOWER on IntegriCloud