summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] added writelane intrinsicTim Renouf2018-02-281-2/+26
| | | | | | | | | | | | | | | | | Summary: For use by LLPC SPV_AMD_shader_ballot extension. The v_writelane instruction was already implemented for use by SGPR spilling, but I had to add an extra dummy operand tied to the destination, to represent that all lanes except the selected one keep the old value of the destination register. .ll test changes were due to schedule changes caused by that new operand. Differential Revision: https://reviews.llvm.org/D42838 llvm-svn: 326353
* AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALUMarek Olsak2018-02-061-2/+8
| | | | | | | | Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 llvm-svn: 324353
* AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9Marek Olsak2018-01-311-22/+31
| | | | | | | | | | | | | | | | Summary: This enables load merging into x2, x4, which is driven by inline offsets. 6500 shaders are affected: Code Size in affected shaders: -15.14 % Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42078 llvm-svn: 323909
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-9/+9
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole.Sam Kolton2017-12-041-0/+22
| | | | | | | | | | | | Summary: Reviewers: arsenm, vpykhtin, rampitec Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D37817 llvm-svn: 319662
* AMDGPU: Use carry-less adds in FI eliminationMatt Arsenault2017-11-301-1/+4
| | | | llvm-svn: 319501
* AMDGPU: Use gfx9 carry-less add/sub instructionsMatt Arsenault2017-11-301-16/+73
| | | | llvm-svn: 319491
* AMDGPU: Consistently check for immediates in SIInstrInfo::FoldImmediateNicolai Haehnle2017-11-281-23/+22
| | | | | | | | | | | | | | | | | | | | | | | Summary: The PeepholeOptimizer pass calls this function solely based on checking DefMI->isMoveImmediate(), which only checks the MoveImm bit of the instruction description. So it's up to FoldImmediate itself to properly check that DefMI *actually* moves from an immediate. I don't have a separate test case for this, but the next patch introduces a test case which happens to crash without this change. This error is caught by the assertion in MachineOperand::getImm(). Change-Id: I88e7cdbcf54d75e1a296822e6fe5f9a5f095bbf8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40342 llvm-svn: 319155
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-2/+2
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* AMDGPU: Replace i64 add/sub loweringMatt Arsenault2017-11-151-0/+73
| | | | | | | | | | | | | | | Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. llvm-svn: 318340
* AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEMMarek Olsak2017-11-091-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: -5.3% code size in affected shaders. Changed stats only: 48486 shaders in 30489 tests Totals: SGPRS: 2086406 -> 2072430 (-0.67 %) VGPRS: 1626872 -> 1627960 (0.07 %) Spilled SGPRs: 7865 -> 7912 (0.60 %) Code Size: 60978060 -> 60188764 (-1.29 %) bytes Max Waves: 374530 -> 374342 (-0.05 %) Totals from affected shaders: SGPRS: 299664 -> 285688 (-4.66 %) VGPRS: 233844 -> 234932 (0.47 %) Spilled SGPRs: 3959 -> 4006 (1.19 %) Code Size: 14905272 -> 14115976 (-5.30 %) bytes Max Waves: 46202 -> 46014 (-0.41 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38915 llvm-svn: 317750
* AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offsetMarek Olsak2017-10-311-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Apps that benefit: - alien isolation - bioshock infinite - civilization: beyond earth - company of heroes 2 - dirt showdown - dota 2 - F1 2015 - grid autosport - hitman - legend of grimrock - serious sam 3: bfe - shadow warrior - talos principle - total war: warhammer - UE4 demos: effects cave, elemental, sun temple Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38914 llvm-svn: 317038
* AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)Marek Olsak2017-10-241-0/+21
| | | | | | | | | | | | | | | | Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427
* AMDGPU: Fix not accounting for instruction size in bundlesMatt Arsenault2017-10-041-1/+14
| | | | | | | These were counted as 0. Fixes branch limit exceeded errors in some large programs. llvm-svn: 314944
* AMDGPU: VALU carry-in and v_cndmask condition cannot be EXECNicolai Haehnle2017-09-291-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* llvm-svn: 314522
* AMDGPU: Fix crash on immediate operandMatt Arsenault2017-09-211-1/+5
| | | | | | | | We can have a v_mac with an immediate src0. We can still fold if it's an inline immediate, otherwise it already uses the constant bus. llvm-svn: 313852
* AMDGPU: Start selecting s_xnor_{b32, b64}Konstantin Zhuravlyov2017-09-181-0/+37
| | | | | | Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565
* Fix warnings in r313297.Jan Sjodin2017-09-141-3/+1
| | | | llvm-svn: 313302
* AMDGPU: Fix violating constant bus restrictionMatt Arsenault2017-09-141-4/+5
| | | | | | You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298
* Add AddresSpace to PseudoSourceValue.Jan Sjodin2017-09-141-0/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297
* AMDGPU: Don't spill SP reg like a normal CSRMatt Arsenault2017-09-131-0/+4
| | | | llvm-svn: 313217
* Allow target to decide when to cluster loads/stores in mischedStanislav Mekhanoshin2017-09-131-0/+38
| | | | | | | | | | | | | | | | MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208
* [AMDGPU] Produce madak and madmk from the two-address passStanislav Mekhanoshin2017-09-111-0/+42
| | | | | | | | | | These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928
* [AMDGPU] Fix shouldClusterMemOps to process flat loadsStanislav Mekhanoshin2017-09-061-0/+4
| | | | | | | | Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640
* [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-081-25/+47
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 310328
* [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsicConnor Abbott2017-08-041-0/+22
| | | | | | | | | | | | | | | | | | | Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34719 llvm-svn: 310088
* [AMDGPU] Add support for Whole Wavefront ModeConnor Abbott2017-08-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087
* [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQMConnor Abbott2017-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085
* AMDGPU: Pass special input registers to functionsMatt Arsenault2017-08-031-5/+4
| | | | llvm-svn: 309998
* AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flatMatt Arsenault2017-07-291-1/+1
| | | | | | | Checking the encoding is insufficient since now there can be global or scratch instructions. llvm-svn: 309472
* AMDGPU: Fix getMemOpBaseRegImmOfs for flat with offsetsMatt Arsenault2017-07-211-3/+13
| | | | llvm-svn: 308762
* Add an ID field to StackObjectsMatt Arsenault2017-07-201-0/+2
| | | | | | | | | | | | | | | | | | | | | On AMDGPU SGPR spills are really spilled to another register. The spiller creates the spills to new frame index objects, which is used as a placeholder. This will eventually be replaced with a reference to a position in a VGPR to write to and the frame index deleted. It is most likely not a real stack location that can be shared with another stack object. This is a problem when StackSlotColoring decides it should combine a frame index used for a normal VGPR spill with a real stack location and a frame index used for an SGPR. Add an ID field so that StackSlotColoring has a way of knowing the different frame index types are incompatible. llvm-svn: 308673
* [AMDGPU] Do not insert an instruction into worklist twice in movetovaluAlfred Huang2017-07-141-12/+12
| | | | | | | | | | In moveToVALU(), move to vector ALU is performed, all instrs in the use chain will be visited. We do not want the same node to be pushed to the visit worklist more than once. Differential Revision: https://reviews.llvm.org/D34726 llvm-svn: 308039
* [AMDGPU] Fix -Wimplicit-fallthrough warnings. NFCI.Simon Pilgrim2017-07-071-0/+2
| | | | llvm-svn: 307381
* AMDGPU: Add operand target flags serializationMatt Arsenault2017-07-021-0/+18
| | | | llvm-svn: 306995
* [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructionsSam Kolton2017-06-271-5/+9
| | | | | | | | | | | | | | Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413
* AMDGPU: M0 operands to spill/restore opcodes are deadNicolai Haehnle2017-06-271-2/+2
| | | | | | | | | | | | | | | | | Summary: With scalar stores, M0 is clobbered and therefore marked as implicitly defined. However, it is also dead. This fixes an assertion when the Greedy Register Allocator decides to optimize a spill/restore pair away again (via tryHintsRecoloring). Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33319 llvm-svn: 306375
* [AMDGPU] SDWA: add support for GFX9 in peephole passSam Kolton2017-06-221-4/+4
| | | | | | | | | | | | | | | | Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986
* [AMDGPU] SDWA: merge VI and GFX9 pseudo instructionsSam Kolton2017-06-211-2/+69
| | | | | | | | | | | | Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9. Reviewers: dp, arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov Differential Revision: https://reviews.llvm.org/D34026 llvm-svn: 305886
* AMDGPU: Don't add same implicit use multiple timesMatt Arsenault2017-06-121-4/+2
| | | | | | | For the last component, the same register use was added as an implicit use and another implicit kill use. llvm-svn: 305205
* AMDGPU: Verify that flat offsets aren't used pre-GFX9Matt Arsenault2017-06-121-2/+11
| | | | | | | For convenience the operand is always present in the instruction, but it isn't valid to use except on GFX9. llvm-svn: 305200
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* AMDGPU/GlobalISel: Mark 32-bit float constants as legalTom Stellard2017-05-261-0/+4
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33212 llvm-svn: 304003
* AMDGPU: Use appropriate soffset for spillingMatt Arsenault2017-05-171-7/+7
| | | | | | | This needs to be the frame offset register, and not the global scratch wave offset register. For kernels, these are the same. llvm-svn: 303287
* AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable]NAKAMURA Takumi2017-05-161-2/+2
| | | | llvm-svn: 303137
* Re-submit AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-9/+301
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111
* Revert 303091.Jan Sjodin2017-05-151-301/+9
| | | | llvm-svn: 303098
* Add AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-9/+301
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-18/+20
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* AMDGPU: Move v_readlane lane select from VGPR to SGPRNicolai Haehnle2017-04-241-0/+13
| | | | | | | | | | | | | | | | | Summary: Fix a compiler bug when the lane select happens to end up in a VGPR. Clarify the semantic of the corresponding intrinsic to be that of the corresponding GLSL: the lane select must be uniform across a wave front, otherwise results are undefined. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32343 llvm-svn: 301197
OpenPOWER on IntegriCloud