summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstrInfo.h
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Fix getInstrLatency() always returning 1Stanislav Mekhanoshin2020-01-141-0/+2
| | | | | | | | We do not have InstrItinerary so generic getInstLatency() was always defaulting to return 1 cycle. We need to use TargetSchedModel instead to compute an instruction's latency. Differential Revision: https://reviews.llvm.org/D72655
* Let targets adjust operand latency of bundlesStanislav Mekhanoshin2020-01-101-1/+1
| | | | | | | | | | | This reverts the AMDGPU DAG mutation implemented in D72487 and gives a more general way of adjusting BUNDLE operand latency. It also replaces FixBundleLatencyMutation with adjustSchedDependency callback in the AMDGPU, fixing not only successor latencies but predecessors' as well. Differential Revision: https://reviews.llvm.org/D72535
* [AMDGPU] Fix bundle schedulingStanislav Mekhanoshin2020-01-091-0/+4
| | | | | | | Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487
* AMDGPU: Use ImmLeaf for inline immediate predicatesMatt Arsenault2020-01-061-0/+4
|
* TII: Fix using Register for a subregister index argumentMatt Arsenault2019-12-271-1/+1
|
* [AMDGPU][GFX10] Disabled v_movrel*[sdwa|dpp] opcodes in codegenDmitry Preobrazhensky2019-11-201-0/+4
| | | | | | | | These opcodes use indirect register addressing so they need special handling by codegen (currently missing). Reviewers: vpykhtin, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D70400
* Use MCRegister in copyPhysRegMatt Arsenault2019-11-111-1/+1
|
* AMDGPU: Disallow spill folding with m0 copiesMatt Arsenault2019-10-301-0/+7
| | | | | | | | | | readlane and writelane instructions are not allowed to use m0 as the data operand, so spilling them is tricky and would require an intermediate SGPR to spill it. Constrain the virtual register class in this caes to disallow the inline spiller from folding the m0 operand directly into the spill instruction. I copied this hack from AArch64 which has the same problem for $sp.
* AMDGPU: Split flat offsets that don't fit in DAGMatt Arsenault2019-10-201-0/+2
| | | | | | | | | | We handle it this way for some other address spaces. Since r349196, SILoadStoreOptimizer has been trying to do this. This is after SIFoldOperands runs, which can change the addressing patterns. It's simpler to just split this earlier. llvm-svn: 375366
* Prune two MachineInstr.h includes, fix up depsReid Kleckner2019-10-191-1/+1
| | | | | | | | | | MachineInstr.h included AliasAnalysis.h, which includes a world of IR constructs mostly unneeded in CodeGen. Prune it. Same for DebugInfoMetadata.h. Noticed with -ftime-trace. llvm-svn: 375311
* [AMDGPU] Support mov dpp with 64 bit operandsStanislav Mekhanoshin2019-10-151-0/+8
| | | | | | | | | | We define mov/update dpp intrinsics as overloaded but do not support i64, which is a practically useful type. Fix the selection and lowering. Differential Revision: https://reviews.llvm.org/D68673 llvm-svn: 374910
* Remove the AliasAnalysis argument in function areMemAccessesTriviallyDisjointChangpeng Fang2019-09-261-2/+1
| | | | | | | | | | Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D58360 llvm-svn: 373024
* [TargetInstrInfo] Let findCommutedOpIndices take const MachineInstr&Simon Pilgrim2019-09-251-1/+1
| | | | | | | | | | Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future. Committed on behalf of @hvdijk (Harald van Dijk) Differential Revision: https://reviews.llvm.org/D66138 llvm-svn: 372882
* [AMDGPU] Added MI bit IsDOTStanislav Mekhanoshin2019-09-171-0/+8
| | | | | | | | NFC, needed for future commit. Differential Revision: https://reviews.llvm.org/D67669 llvm-svn: 372151
* [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. FixedAlexander Timofeev2019-09-171-0/+13
| | | | | | | Defferential Revision: https://reviews.llvm.org/D67101 Reviewers: rampitec, vpykhtin llvm-svn: 372086
* Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.Alexander Timofeev2019-09-131-11/+0
| | | | llvm-svn: 371873
* [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.Alexander Timofeev2019-09-101-0/+11
| | | | | | | | Reviewers: rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D67101 llvm-svn: 371508
* AMDGPU: Don't use frame virtual registersMatt Arsenault2019-08-291-0/+6
| | | | | | | | | | | | | | SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281
* AMDGPU/GlobalISel: Select flat loadsMatt Arsenault2019-07-161-0/+6
| | | | | | | | Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237
* [AMDGPU] Fix DPP combiner check for exec modificationJay Foad2019-07-121-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910
* [AMDGPU] gfx908 schedulingStanislav Mekhanoshin2019-07-111-0/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D64590 llvm-svn: 365826
* [AMDGPU] gfx908 mAI instructions, MC partStanislav Mekhanoshin2019-07-091-0/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563
* AMDGPU: Fold frame index into MUBUFMatt Arsenault2019-06-241-0/+5
| | | | | | | | | | | | | | | | This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185
* [AMDGPU] hazard recognizer for fp atomic to s_denorm_modeStanislav Mekhanoshin2019-06-211-0/+8
| | | | | | | | | This requires 3 wait states unless there is a wait or VALU in between. Differential Revision: https://reviews.llvm.org/D63619 llvm-svn: 364074
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-201-0/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* AMDGPU: Change API for checking for exec modificationMatt Arsenault2019-06-181-5/+8
| | | | | | | | | | | | | | | | | | Invert the name and return value to better reflect the imprecise nature. Force passing in the DefMI, since it's known in the 2 users and could possibly fail for an arbitrary vreg. Allow specifying a specific user instruction. Scan through use instructions, instead of use operands. Add scan thresholds instead of searching infinitely. Stop using a set to track seen uses. I didn't understand this usage, or why it would not check the last use. I don't think the use list has any particular order. llvm-svn: 363675
* AMDGPU: Prepare for explicit absolute relocations in code generationNicolai Haehnle2019-06-161-2/+5
| | | | | | | | | | | | | | | | | Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517
* [AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin2019-06-161-0/+2
| | | | | | | | | This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
* AMDGPU: Fix missing constMatt Arsenault2019-06-141-1/+1
| | | | llvm-svn: 363383
* AMDGPU: Fix using 2 different enums for same operand flagsMatt Arsenault2019-06-051-7/+4
| | | | | | | These enums are really for the same namespace of flags set on arbitrary MachineOperands, so merge them to avoid value collisions. llvm-svn: 362640
* [AMDGPU] gfx1010 VOPC implementationStanislav Mekhanoshin2019-04-261-0/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D61208 llvm-svn: 359358
* [CodeGen] Add "const" to MachineInstr::mayAliasBjorn Pettersson2019-04-191-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744
* AMDGPU: Make exec mask optimzations more resistant to block splitsMatt Arsenault2019-03-281-0/+4
| | | | | | | Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170
* [AMDGPU] Fix SGPR fixing through SCC chainingMichael Liao2019-03-151-3/+3
| | | | | | | | | | | | | | | | | | Summary: - During the fixing of SGPR copying from VGPR, ensure users of SCC is properly propagated, i.e. * only propagate through live def of SCC, * skip the SCC-def inst itself, and * stop the propagation on the other SCC-def inst after checking its SCC-use first. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59362 llvm-svn: 356258
* [AMDGPU] Fix DPP combinerValery Pykhtin2019-02-081-0/+6
| | | | | | | | | | | | | Differential revision: https://reviews.llvm.org/D55444 dpp move with uses and old reg initializer should be in the same BB. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Otherwise the old register value is checked for identity. Added add, subrev, and, or instructions to the old folding function. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user. The pass is still disabled by default. llvm-svn: 353513
* [AMDGPU] Fix a weird WWM intrinsic issue.Neil Henning2019-01-291-4/+0
| | | | | | | | | | | I found a really strange WWM issue through a very convoluted shader that essentially boils down to a bug in SIInstrInfo where canReadVGPR did not correctly identify that WWM is like a copy and can have a VGPR as its source. Differential Revision: https://reviews.llvm.org/D56002 llvm-svn: 352500
* [AMDGPU] Fixed hazard recognizer to walk predecessorsStanislav Mekhanoshin2019-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | Fixes two problems with GCNHazardRecognizer: 1. It only scans up to 5 instructions emitted earlier. 2. It does not take control flow into account. An earlier instruction from the previous basic block is not necessarily a predecessor. At the same time a real predecessor block is not scanned. The patch provides a way to distinguish between scheduler and hazard recognizer mode. It is OK to work with emitted instructions in the scheduler because we do not really know what will be emitted later and its order. However, when pass works as a hazard recognizer the schedule is already finalized, and we have full access to the instructions for the whole function, so we can properly traverse predecessors and their instructions. Differential Revision: https://reviews.llvm.org/D56923 llvm-svn: 351759
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Add llvm.amdgcn.ds.ordered.add & swapMarek Olsak2019-01-161-0/+2
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351
* Revert "[AMDGPU] Fix DPP combiner"Valery Pykhtin2019-01-091-6/+0
| | | | | | This reverts commit e3e2923a39cbec3b3bc3a7d3f0e9a77a4115080e, svn revision rL350721 llvm-svn: 350730
* [AMDGPU] Fix DPP combinerValery Pykhtin2019-01-091-0/+6
| | | | | | | | | | | | | | Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now. Test is fully reworked, added comments. Other fixes: 1. dpp move with uses and old reg initializer should be in the same BB. 2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity. 3. Added add, subrev, and, or instructions to the old folding function. 4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user. Differential revision: https://reviews.llvm.org/D55444 llvm-svn: 350721
* [AMDGPU] Add new Mode Register passTim Corringham2018-12-101-0/+8
| | | | | | | | | | | | | | | A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. llvm-svn: 348754
* [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XORGraham Sellers2018-12-011-0/+3
| | | | | | | | | The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit. Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it. Differential: https://reviews.llvm.org/D55071 llvm-svn: 348075
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-11-301-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050
* [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)Valery Pykhtin2018-11-301-1/+31
| | | | | | | | Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 llvm-svn: 347993
* [AMDGPU] Add and update scalar instructionsGraham Sellers2018-11-291-0/+8
| | | | | | | | | This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit. A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly). Differential: https://reviews.llvm.org/D54714 llvm-svn: 347877
* [CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operandFrancis Visoiu Mistrih2018-11-281-5/+4
| | | | | | | | | | | | | | | | | | Currently, instructions doing memory accesses through a base operand that is not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`. This means that functions such as `TII::shouldClusterMemOps` will bail out on instructions using an FI as a base instead of a register. The goal of this patch is to refactor all this to return a base operand instead of a base register. Then in a separate patch, I will add FI support to the mem op clustering in the MachineScheduler. Differential Revision: https://reviews.llvm.org/D54846 llvm-svn: 347746
* [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/STRon Lieberman2018-11-161-0/+3
| | | | | | | | | Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. llvm-svn: 347008
* Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"Nicolai Haehnle2018-11-071-0/+2
| | | | | | | | This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. llvm-svn: 346364
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-10-171-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696
OpenPOWER on IntegriCloud