summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstrInfo.h
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Prevent spills before exec mask is restoredStanislav Mekhanoshin2017-01-201-0/+2
| | | | | | | | | | | | | Inline spiller can decide to move a spill as early as possible in the basic block. It will skip phis and label, but we also need to make sure it skips instructions in the basic block prologue which restore exec mask. Added isPositionLike callback in TargetInstrInfo to detect instructions which shall be skipped in addition to common phis, labels etc. Differential Revision: https://reviews.llvm.org/D27997 llvm-svn: 292554
* AMDGPU: Fix asan errors when folding operandsMatt Arsenault2016-12-101-2/+2
| | | | | | | This was failing when trying to fold immediates into operand 1 of a phi, which only has one statically known operand. llvm-svn: 289337
* AMDGPU: Fix handling of 16-bit immediatesMatt Arsenault2016-12-101-4/+85
| | | | | | | | | | | | | | | | | | Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
* AMDGPU: Refactor exp instructionsMatt Arsenault2016-12-051-0/+8
| | | | | | | | | | | | | | | Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695
* AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies passTom Stellard2016-11-161-0/+6
| | | | | | | | | | | | | | | | | | | | | | Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131
* AMDGPU: Workaround for instruction size with literalsMatt Arsenault2016-11-011-0/+8
| | | | | | | | | | Instructions with a 32-bit base encoding with an optional 32-bit literal encoded after them report their size as 4 for the disassembler. Consider these when computing the MachineInstr size. This fixes problems caused by size estimate consistency in BranchRelaxation. llvm-svn: 285743
* AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructionsTom Stellard2016-10-281-0/+2
| | | | | | | | | | | | | | Summary: Flat instruction can return out of order, so we need always need to wait for all the outstanding flat operations. Reviewers: tony-tye, arsenm Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D25998 llvm-svn: 285479
* AMDGPU: Add definitions for scalar store instructionsMatt Arsenault2016-10-281-0/+10
| | | | | | | | | | Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463
* [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external ↵Konstantin Zhuravlyov2016-10-141-1/+12
| | | | | | | | and global address space variables Differential Revision: https://reviews.llvm.org/D25562 llvm-svn: 284196
* BranchRelaxation: Support expanding unconditional branchesMatt Arsenault2016-10-061-0/+24
| | | | | | | AMDGPU needs to expand unconditional branches in a new block with an indirect branch. llvm-svn: 283464
* AMDGPU: Use SOPK compare instructionsMatt Arsenault2016-09-161-0/+11
| | | | llvm-svn: 281780
* Finish renaming remaining analyzeBranch functionsMatt Arsenault2016-09-141-2/+2
| | | | llvm-svn: 281535
* Revert "AMDGPU: Use SOPK compare instructions"Matt Arsenault2016-09-141-11/+0
| | | | | | Accidentally committed llvm-svn: 281514
* AMDGPU: Use SOPK compare instructionsMatt Arsenault2016-09-141-0/+11
| | | | llvm-svn: 281513
* Make analyzeBranch family of instruction names consistentMatt Arsenault2016-09-141-1/+1
| | | | | | | analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506
* AArch64: Use TTI branch functions in branch relaxationMatt Arsenault2016-09-141-2/+4
| | | | | | | | | The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505
* AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndexMatt Arsenault2016-09-101-0/+16
| | | | llvm-svn: 281128
* AMDGPU: Support commuting with immediate in src0Matt Arsenault2016-09-081-1/+10
| | | | llvm-svn: 280970
* AMDGPU: Fix not estimating MBB operand sizes correctlyMatt Arsenault2016-08-131-0/+6
| | | | llvm-svn: 278590
* AMDGPU: Stay in WQM for non-intrinsic storesNicolai Haehnle2016-08-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504
* TargetInstrInfo: add virtual function getInstSizeInBytesSjoerd Meijer2016-07-291-1/+1
| | | | | | | | | This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of subclasses already implement. Differential Revision: https://reviews.llvm.org/D22885 llvm-svn: 277126
* Rename AnalyzeBranch* to analyzeBranch*.Jacques Pienaar2016-07-151-1/+1
| | | | | | | | | | | | Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564
* AMDGPU/SI: Add support for R_AMDGPU_GOTPCRELTom Stellard2016-07-131-0/+6
| | | | | | | | | | Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21484 llvm-svn: 275268
* AMDGPU: WQM cleanupsMatt Arsenault2016-07-131-0/+8
| | | | | | | | - Add new TTI instruction checks - Don't use const for blocks that are mutated. - Checking isBranch and isTerminator should be redundant llvm-svn: 275252
* AMDGPU: Treat texture gather instructions more like other MIMG instructionsNicolai Haehnle2016-07-111-0/+8
| | | | | | | | | | | | | | | | | | | | | Summary: Setting MIMG to 0 has a bunch of unexpected side effects, including that isVMEM returns false which leads to incorrect treatment in the hazard recognizer. The reason I noticed it is that it also leads to incorrect treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug. The only reason why MIMG was set to 0 is to signal the special handling of dmasks, but that can be checked differently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22210 llvm-svn: 275113
* AMDGPU: Move R600 only pieces into R600 classesMatt Arsenault2016-07-091-2/+0
| | | | llvm-svn: 274979
* CodeGen: Use MachineInstr& in TargetInstrInfo, NFCDuncan P. N. Exon Smith2016-06-301-50/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
* AMDGPU: Remove unused functionMatt Arsenault2016-06-281-6/+0
| | | | llvm-svn: 274033
* AMDGPU: Define a schedule class for COPY.Matthias Braun2016-06-241-0/+8
| | | | | | | | | | COPY was lacking a scheduling class, define it to avoid regressions in the upcoming change to the bidirectional MachineScheduler. Approved by tstellar on IRC. Differential Revision: http://reviews.llvm.org/D21540 llvm-svn: 273751
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-2/+3
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU/SI: Set INDEX_STRIDE for scratch coalescingMarek Olsak2016-06-131-2/+3
| | | | | | | | | | | | | | | | | Summary: Mesa and other users must set this to enable coalescing: - STRIDE = 0 - SWIZZLE_ENABLE = 1 This makes one particular compute shader 8x faster. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21136 llvm-svn: 272556
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-121-4/+3
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* AMDGPU: Add function for getting instruction sizeMatt Arsenault2016-06-061-0/+2
| | | | llvm-svn: 271936
* AMDGPU: Handle cbranch vccz/vccnzMatt Arsenault2016-05-211-1/+5
| | | | llvm-svn: 270297
* AMDGPU: Implement ReverseBranchConditionMatt Arsenault2016-05-211-0/+3
| | | | llvm-svn: 270296
* AMDGPU: Implement AnalyzeBranchMatt Arsenault2016-05-211-1/+21
| | | | | | Original patch by Tom Stellard llvm-svn: 270295
* Add missing override.Rafael Espindola2016-04-301-1/+2
| | | | llvm-svn: 268163
* AMDGPU/SI: Enable the post-ra schedulerTom Stellard2016-04-301-0/+21
| | | | | | | | | | | | | | Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
* [MachineScheduler]Add support for store clusteringJun Bum Lim2016-04-151-3/+3
| | | | | | | | | | | | Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437
* AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStatesTom Stellard2016-04-071-1/+2
| | | | | | | | | | | | Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678
* AMDGPU: Add SIWholeQuadMode passNicolai Haehnle2016-03-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982
* AMDGPU/SI: Handle wait states required for DPP instructionsTom Stellard2016-03-141-0/+8
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 llvm-svn: 263447
* AMDGPU: R600 code splitting cleanupMatt Arsenault2016-03-111-3/+3
| | | | | | | Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204
* [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.Chad Rosier2016-03-091-1/+1
| | | | | | http://reviews.llvm.org/D17967 llvm-svn: 263021
* AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointerTom Stellard2016-02-201-9/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: Instead of trying to replace SMRD instructions with a VGPR base pointer with an equivalent MUBUF instruction, we now copy the base pointer to SGPRs using v_readfirstlane. This is safe to do, because any load selected as an SMRD instruction has been proven to have a uniform base pointer, so each thread in the wave will have the same pointer value in VGPRs. This will fix some errors on VI from trying to replace SMRD instructions with addr64-enabled MUBUF instructions that don't exist. Reviewers: arsenm, cfang, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17305 llvm-svn: 261385
* AMDGPU/SI: Detect uniform branches and emit s_cbranch instructionsTom Stellard2016-02-121-0/+3
| | | | | | | | | | Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
* AMDGPU: Set element_size in private resource descriptorMatt Arsenault2016-02-121-1/+1
| | | | | | | | | Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651
* AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRsTom Stellard2016-02-111-0/+7
| | | | | | | | | | | | | | | | Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599
* AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfoTom Stellard2016-02-051-14/+0
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16862 llvm-svn: 259900
* AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cppTom Stellard2016-01-281-5/+10
| | | | | | | | | | | | | | Summary: Also delete all the stub functions that are identical to the implementations in TargetInstrInfo.cpp. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16609 llvm-svn: 259054
OpenPOWER on IntegriCloud