summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Assembler support for expMatt Arsenault2016-12-053-27/+208
| | | | | | | | compr is not currently parsed (or printed) correctly, but that should probably be fixed along with intrinsic changes. llvm-svn: 288698
* AMDGPU: Change how exp is printedMatt Arsenault2016-12-055-7/+148
| | | | | | | This is an improvement over a long list of unreadable numbers. A follow up patch will try to match how sc formats these. llvm-svn: 288697
* AMDGPU: Refactor exp instructionsMatt Arsenault2016-12-0514-73/+156
| | | | | | | | | | | | | | | Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695
* [AMDGPU] Disassembler: fix s_buffer_store_dword instructionsSam Kolton2016-12-051-2/+11
| | | | | | | | | | | | Summary: s_buffer_store_dword instructions sdata operand was called sdst in encoding. This caused disassembler to fail. Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, nhaehnle, rampitec Differential Revision: https://reviews.llvm.org/D27100 llvm-svn: 288657
* AMDGPU: remove a couple of unused variablesSaleem Abdulrasool2016-12-031-14/+2
| | | | | | | | | | | | | | | | | lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::spillSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const': lib/Target/AMDGPU/SIRegisterInfo.cpp:572:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable] const TargetRegisterClass *SubRC = nullptr; ^ lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::restoreSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const': lib/Target/AMDGPU/SIRegisterInfo.cpp:723:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable] const TargetRegisterClass *SubRC = nullptr; ^ The variable was assigned to, but never used. The functions called did not mutate state. Simplify the logic and remove the variable. Identified by gcc 5.4.0. llvm-svn: 288601
* AMDGPU: Clean up struct initializersMatt Arsenault2016-12-031-8/+7
| | | | llvm-svn: 288590
* AMDGPU: Implement isCheapAddrSpaceCastMatt Arsenault2016-12-022-2/+13
| | | | llvm-svn: 288523
* AMDGPU: Use wider scalar spills for SGPR spillingMatt Arsenault2016-12-021-15/+70
| | | | | | | | | | | | | | | | Since the spill is for the whole wave, these don't have the swizzling problems that vector stores do and a single 4-byte allocation is enough to spill a 64 element register. This should reduce the number of spill instructions and put all the spills for a register in the same cacheline. This should save allocated private size, but for now it doesn't. The extra slots are allocated for each component, but never used because the frame layout is essentially finalized before frame indices are replaced. For always using the scalar store path, this should probably be moved into processFunctionBeforeFrameFinalized. llvm-svn: 288445
* AMDGPU: Disallow exec as SMEM instruction operandMatt Arsenault2016-11-294-19/+42
| | | | | | | | | | | | | | | | | | | This is not in the list of valid inputs for the encoding. When spilling, copies from exec can be folded directly into the spill instruction which results in broken stores. This only fixes the operand constraints, more codegen work is required to avoid emitting the invalid spills. This sort of breaks the dbg.value test. Because the register class of the s_load_dwordx2 changes, there is a copy to SReg_64, and the copy is the operand of dbg_value. The copy is later dead, and removed from the dbg_value. llvm-svn: 288191
* AMDGPU: Use SGPR_64 for argument loweringsMatt Arsenault2016-11-291-7/+7
| | | | llvm-svn: 288190
* AMDGPU: Rename flat operands to match mubufMatt Arsenault2016-11-294-21/+21
| | | | | | | | | | Use vaddr/vdst for the same purposes. This also fixes a beg in SIInsertWaits for the operand check. The stored value operand is currently called data0 in the single offset case, not data. llvm-svn: 288188
* AMDGPU: Use else ifMatt Arsenault2016-11-291-10/+6
| | | | llvm-svn: 288187
* AMDGPU: Materialize frame index before addMatt Arsenault2016-11-291-1/+6
| | | | | | | | | | | It isn't generally safe to fold the frame index directly into the operand since it will possibly not be an inline immediate after it is expanded. This surprisingly seems to produce better code, since the FI doesn't prevent folding other immediate operands. llvm-svn: 288185
* AMDGPU: Refactor immediate folding logicMatt Arsenault2016-11-291-14/+50
| | | | | | | | | | | | | Change the logic for when to fold immediates to consider the destination operand rather than the source of the materializing mov instruction. No change yet, but this will allow for correctly handling i16/f16 operands. Since 32-bit moves are used to materialize constants for these, the same bitvalue will not be in the register. llvm-svn: 288184
* AMDGPU/SI: Avoid moving PHIs to VALU when phi values are defined in scalar ↵Tom Stellard2016-11-291-8/+38
| | | | | | | | | | | | branches Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23417 llvm-svn: 288095
* MachineScheduler: Export function to construct "default" scheduler.Matthias Braun2016-11-283-19/+10
| | | | | | | | | | | | | | | | | | This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057
* [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition ↵Stanislav Mekhanoshin2016-11-283-9/+98
| | | | | | | | | | | | | | | | | | | | | | copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32 and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp is implemented. Additional side effect of this is that we may consume less VGPRs at a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. Differential Revision: https://reviews.llvm.org/D26114 llvm-svn: 288053
* AMDGPU/SI: Use float as the operand type for amdgcn.interp intrinsicsTom Stellard2016-11-262-2/+4
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26724 llvm-svn: 287962
* AMDGPU/SI: Add back reverted SGPR spilling code, but disable itMarek Olsak2016-11-258-96/+284
| | | | | | suggested as a better solution by Matt llvm-svn: 287942
* Revert "AMDGPU: Implement SGPR spilling with scalar stores"Marek Olsak2016-11-253-153/+10
| | | | | | This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e. llvm-svn: 287936
* Revert "AMDGPU: Fix MMO when splitting spill"Marek Olsak2016-11-252-79/+47
| | | | | | This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1. llvm-svn: 287935
* Revert "AMDGPU: Fix adding extra implicit def of register"Marek Olsak2016-11-251-25/+14
| | | | | | This reverts commit e834ce5976567575621901fb967b8018b9916d71. llvm-svn: 287934
* Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling"Marek Olsak2016-11-251-1/+1
| | | | | | This reverts commit 057bbbe4ae170247ba37f08f2e70ef185267d1bb. llvm-svn: 287933
* Revert "AMDGPU: Make m0 unallocatable"Marek Olsak2016-11-256-23/+16
| | | | | | This reverts commit 124ad83dae04514f943902446520c859adee0e96. llvm-svn: 287932
* Revert "AMDGPU: Remove m0 spilling code"Marek Olsak2016-11-251-3/+37
| | | | | | This reverts commit f18de36554eb22416f8ba58e094e0272523a4301. llvm-svn: 287931
* Revert "AMDGPU: Preserve m0 value when spilling"Marek Olsak2016-11-251-34/+5
| | | | | | This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce. llvm-svn: 287930
* AMDGPU: Preserve m0 value when spillingMatt Arsenault2016-11-241-5/+34
| | | | llvm-svn: 287844
* TRI: Add hook to pass scavenger during frame eliminationMatt Arsenault2016-11-242-1/+12
| | | | | | | | | | | | The scavenger was not passed if requiresFrameIndexScavenging was enabled. I need to be able to test for the availability of an unallocatable register here, so I can't create a virtual register for it. It might be better to just always use the scavenger and stop creating virtual registers. llvm-svn: 287843
* AMDGPU: Remove m0 spilling codeMatt Arsenault2016-11-241-37/+3
| | | | | | Since m0 isn't allocatable it should never be spilled anymore. llvm-svn: 287842
* AMDGPU: Make m0 unallocatableMatt Arsenault2016-11-246-16/+23
| | | | | | | | | | | m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it. This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0. llvm-svn: 287841
* AMDGPU: Cleanup immediate folding codeMatt Arsenault2016-11-231-64/+62
| | | | | | | Move code down to use, reorder to avoid hard to follow immediate folding logic. llvm-svn: 287818
* AMDGPU: Fix debug printingMatt Arsenault2016-11-231-1/+1
| | | | | | The uint8_t was printed as a char which didn't really work. llvm-svn: 287817
* AMDGPU: Fix not setting kill flag on temp reg when spillingMatt Arsenault2016-11-231-1/+1
| | | | llvm-svn: 287808
* AMDGPU: Fix adding extra implicit def of registerMatt Arsenault2016-11-231-14/+25
| | | | | | | In the scalar case, there's no reason to add an additional def of the same register. llvm-svn: 287807
* AMDGPU: Fix MMO when splitting spillMatt Arsenault2016-11-232-47/+79
| | | | | | | | | | The size and offset were wrong. The size of the object was being used for the size of the access, when here it is really being split into 4-byte accesses. The underlying object size is set in the MachinePointerInfo, which also didn't have the offset set. llvm-svn: 287806
* [AMDGPU] Fix multiple vreg definitions in si-lower-control-flowStanislav Mekhanoshin2016-11-221-7/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D26939 llvm-svn: 287608
* Check that emitted instructions meet their predicates on all targets except ↵Daniel Sanders2016-11-194-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | ARM, Mips, and X86. Summary: * ARM is omitted from this patch because this check appears to expose bugs in this target. * Mips is omitted from this patch because this check either detects bugs or deliberate emission of instructions that don't satisfy their predicates. One deliberate use is the SYNC instruction where the version with an operand is correctly defined as requiring MIPS32 while the version without an operand is defined as an alias of 'SYNC 0' and requires MIPS2. * X86 is omitted from this patch because it doesn't use the tablegen-erated MCCodeEmitter infrastructure. Patches for ARM and Mips will follow. Depends on D25617 Reviewers: tstellarAMD, jmolloy Subscribers: wdng, jmolloy, aemerson, rengolin, arsenm, jyknight, nemanjai, nhaehnle, tstellarAMD, llvm-commits Differential Revision: https://reviews.llvm.org/D25618 llvm-svn: 287439
* [AMDGPU] Change frexp.exp intrinsic to return i16 for f16 inputKonstantin Zhuravlyov2016-11-182-2/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D26862 llvm-svn: 287389
* AMDGPU: Fix unused variable warningMatt Arsenault2016-11-181-5/+4
| | | | llvm-svn: 287362
* AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit instsTom Stellard2016-11-181-3/+14
| | | | | | | | | | | | | | Summary: The 32-bit instructions don't zero the high 16-bits like the 16-bit instructions do. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26828 llvm-svn: 287342
* AMDGPU: Fix legalization of MUBUF instructions in shadersNicolai Haehnle2016-11-181-5/+13
| | | | | | | | | | | | | | | | | | | | | | Summary: The addr64-based legalization is incorrect for MUBUF instructions with idxen set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects e.g. shaders that access buffer textures. Since we never actually need the addr64-legalization in shaders, this patch takes the easy route and keys off the calling convention. If this ever affects (non-OpenGL) compute, the type of legalization needs to be chosen based on some TSFlag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664 Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26747 llvm-svn: 287339
* Fix spelling mistakes in AMDGPU target comments. NFC.Simon Pilgrim2016-11-185-11/+11
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287333
* AMDGPU: Move redundant setting of inst propertiesMatt Arsenault2016-11-181-3/+1
| | | | llvm-svn: 287311
* AMDGPU: Fix crash on illegal type for inlineasmMatt Arsenault2016-11-181-0/+2
| | | | | | | There are still crashes on non-MVT types in other places. llvm-svn: 287310
* Revert "AMDGPU: Enable ConstrainCopy DAG mutation"Konstantin Zhuravlyov2016-11-171-3/+0
| | | | | | | | This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233
* [AMDGPU] Custom lower f16 = fp_round f64Konstantin Zhuravlyov2016-11-172-0/+23
| | | | llvm-svn: 287203
* [AMDGPU] Promote f16/i16 conversions to f32/i32Konstantin Zhuravlyov2016-11-172-58/+8
| | | | llvm-svn: 287201
* [AMDGPU] Expand `br_cc` for f16Konstantin Zhuravlyov2016-11-171-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D26732 llvm-svn: 287199
* AMDGPU: Enable ConstrainCopy DAG mutationMatt Arsenault2016-11-161-0/+3
| | | | | | | This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146
* AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies passTom Stellard2016-11-164-26/+78
| | | | | | | | | | | | | | | | | | | | | | Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131
OpenPOWER on IntegriCloud