summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Add SIWholeQuadMode passNicolai Haehnle2016-03-211-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982
* AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormatMichel Danzer2016-03-161-3/+3
| | | | | Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 263626
* [AMDGPU] Assembler: change v_madmk operands to have same order as mad.Nikolay Haustov2016-03-111-15/+2
| | | | | | | | | | The constant is now at source operand 1 (previously at 2). This is also how it is in legacy AMD sp3 assembler. Update tests. Differential Revision: http://reviews.llvm.org/D17984 llvm-svn: 263212
* [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.Chad Rosier2016-03-091-3/+3
| | | | | | http://reviews.llvm.org/D17967 llvm-svn: 263021
* AMDGPU/SI: Add support for spiling SGPRs to scratch bufferTom Stellard2016-03-041-0/+2
| | | | | | | | | | | | | | Summary: This is necessary for when we run out of VGPRs and can no longer use v_{read,write}_lane for spilling SGPRs. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17592 llvm-svn: 262732
* AMDGPU: Simplify boolean conditional return statementsMatt Arsenault2016-03-021-13/+6
| | | | | | Patch by Richard Thomson llvm-svn: 262536
* AMDGPU: Cleanup suggested in bug 23960Matt Arsenault2016-03-021-6/+3
| | | | llvm-svn: 262456
* AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and ↵Changpeng Fang2016-03-011-0/+4
| | | | | | | | | | | | | | | | Intrinsics Summary: This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics, which are new since VI. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17614 llvm-svn: 262356
* AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointerTom Stellard2016-02-201-229/+20
| | | | | | | | | | | | | | | | | | | | | | Summary: Instead of trying to replace SMRD instructions with a VGPR base pointer with an equivalent MUBUF instruction, we now copy the base pointer to SGPRs using v_readfirstlane. This is safe to do, because any load selected as an SMRD instruction has been proven to have a uniform base pointer, so each thread in the wave will have the same pointer value in VGPRs. This will fix some errors on VI from trying to replace SMRD instructions with addr64-enabled MUBUF instructions that don't exist. Reviewers: arsenm, cfang, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17305 llvm-svn: 261385
* [AMDGPU] Rename $dst operand to $vdst for VOP instructions.Tom Stellard2016-02-161-1/+1
| | | | | | | | | | | | | | Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler. Reviewers: vpykhtin, tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, nhaustov Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D16920 llvm-svn: 260986
* AMDGPU/SI: Detect uniform branches and emit s_cbranch instructionsTom Stellard2016-02-121-10/+59
| | | | | | | | | | Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
* AMDGPU: Set element_size in private resource descriptorMatt Arsenault2016-02-121-0/+4
| | | | | | | | | Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651
* AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRsTom Stellard2016-02-111-0/+42
| | | | | | | | | | | | | | | | Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599
* AMDGPU/SI: When splitting SMRD instructions, add its users to VALU worklistTom Stellard2016-02-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: When we split SMRD instructions into two MUBUFs we were adding the users of the newly created MUBUFs to the VALU worklist. However, the only users these instructions had was the REG_SEQUENCE that was inserted by splitSMRD when the original SMRD instruction was split. We need to make sure to add the users of the original SMRD to the VALU worklist before it is split. I have a test case, but it requires one other bug fix, so it will be added in a later commt. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17101 llvm-svn: 260588
* AMDGPU: Fix constant bus use check with subregistersMatt Arsenault2016-02-111-4/+8
| | | | | | | | | | | If the two operands to an instruction were both subregisters of the same super register, it would incorrectly think this counted as the same constant bus use. This fixes the verifier error in fmin_legacy.ll which was missing -verify-machineinstrs. llvm-svn: 260495
* AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfoTom Stellard2016-02-051-42/+0
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16862 llvm-svn: 259900
* AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cppTom Stellard2016-01-281-19/+11
| | | | | | | | | | | | | | Summary: Also delete all the stub functions that are identical to the implementations in TargetInstrInfo.cpp. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16609 llvm-svn: 259054
* AMDGPU/SI: Add SI Machine SchedulerNicolai Haehnle2016-01-131-0/+12
| | | | | | | | | | | | | | | | Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609
* AMDGPU/SI: Fold operands with sub-registersNicolai Haehnle2016-01-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074
* AMDGPU/SI: use S_MOV_B64 for larger copies in copyPhysRegNicolai Haehnle2015-12-191-6/+22
| | | | | | | | | | Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15629 llvm-svn: 256073
* AMDGPU: fix overlapping copies in copyPhysRegNicolai Haehnle2015-12-191-9/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When copying aggregate registers within the same register class, there may be an overlap between source and destination that forces us to do the copy backwards. Do the simplest possible thing that guarantees the correct order of moves when there are overlaps, and does whatever when there is no overlap. (The last part forces some trivial adjustments to test cases.) Together with r255906, this fixes a VM fault in Unreal Elemental Demo. While at it, change the generation of kill and def flags to something that looks more reasonable. This method is used very late during compilation, so it probably doesn't matter in practice, and to be honest, I don't know if this change is actually correct because the semantics in connection with aggregate registers vs. sub-registers are not clear to me. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15622 llvm-svn: 256072
* AMDGPU/SI: Test commitChangpeng Fang2015-12-181-1/+1
| | | | | | | | | | | | Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256022
* Revert "AMDGPU/SI: Test commit"Changpeng Fang2015-12-181-1/+1
| | | | | | This reverts commit a493cb636e0152ad28210934a47c6c44b1437193. llvm-svn: 256021
* AMDGPU/SI: Test commitChangpeng Fang2015-12-181-1/+1
| | | | | | | | | | | | Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256020
* AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndexNicolai Haehnle2015-12-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: The method insertNOPs expected the number of wait states to be passed as parameter, while eliminateFrameIndex passed the immediate argument for the S_NOP, leading to an off-by-one error. Rename the method to make the meaning of its parameter clearer. The number of 4 / 5 wait states (which is what the method has always _tried_ to do according to the comment) is correct according to the hardware docs. I stumbled upon this while trying to track down the cause of https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed, this patch unfortunately does not fix that bug... Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15542 llvm-svn: 255906
* AMDGPU/SI: Emit constant arrays in the .text sectionTom Stellard2015-12-101-20/+28
| | | | | | | | | | | | | | | Summary: This allows us to remove the END_OF_TEXT_LABEL hack we had been using and simplifies the fixups used to compute the address of constant arrays. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15257 llvm-svn: 255204
* AMDGPU: Optimize VOP2 operand legalizationMatt Arsenault2015-12-011-45/+125
| | | | | | | | | | | | | | | | | | Don't use commuteInstruction, and don't commute if doing so will not improve legality. Skip the more complex checks for literal operands and constant bus restrictions, which are not a concern for VOP2 instructions because src1 does not accept SGPRs or constants and few implicitly read vcc. This gets called quite a few times and the attempts at commuting are a significant fraction of the time spent in SIFixSGPRCopies, so it's somewhat worthwhile to optimize. With this patch and others leading up to it, this reduces the compile time of SIFixSGPRCopies on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system. llvm-svn: 254452
* AMDGPU: Rework how private buffer passed for HSAMatt Arsenault2015-11-301-10/+4
| | | | | | | | | | | | | | | | If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331
* AMDGPU: Rename enums to be consistent with HSA code object terminologyMatt Arsenault2015-11-301-8/+11
| | | | llvm-svn: 254330
* AMDGPU: Remove SIPrepareScratchRegsMatt Arsenault2015-11-301-8/+10
| | | | | | | | | | | | | | | | | | | | | | It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. llvm-svn: 254329
* AMDGPU/SI: select S_ABS_I32 when possible (v2)Marek Olsak2015-11-251-0/+29
| | | | | | | | | | v2: added more tests, moved the SALU->VALU conversion to a separate function It looks like it's not possible to get subregisters in the S_ABS lowering code, and I don't feel like guessing without testing what the correct code would look like. llvm-svn: 254095
* AMDGPU: Create emergency stack slots during frame loweringMatt Arsenault2015-11-061-0/+1
| | | | | | Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327
* AMDGPU: Remove unused scratch resource operandsMatt Arsenault2015-11-061-72/+129
| | | | | | The SGPR spill pseudos don't actually use them. llvm-svn: 252324
* AMDGPU: Fix hardcoded alignment of spill.Matt Arsenault2015-11-061-2/+1
| | | | | | | Instead of forcing 4 alignment when spilled, set register class alignments. llvm-svn: 252322
* AMDGPU: Also track whether SGPRs were spilledMatt Arsenault2015-11-051-0/+2
| | | | llvm-svn: 252145
* AMDGPU: Fix assert when legalizing atomic operandsMatt Arsenault2015-11-051-15/+51
| | | | | | | | | | The operand layout is slightly different for the atomic opcodes from the usual MUBUF loads and stores. This should only fix it on SI/CI. VI is still broken because it still emits the addr64 replacement. llvm-svn: 252140
* AMDGPU: Make findUsedSGPR more readableMatt Arsenault2015-11-031-7/+18
| | | | | | Add more comments etc. llvm-svn: 251996
* AMDGPU: Simplify VOP3 operand legalization.Matt Arsenault2015-10-211-41/+49
| | | | | | | | | | | | | | | | | | | | | | | | | This was checking for a variety of situations that should never happen. This saves a tiny bit of compile time. We should not be selecting instructions with invalid operands in the first place. Most of the time for registers copys are inserted to the correct operand register class. For VOP3, since all operand types are supported and literal constants never are, we just need to verify the constant bus requirements (all immediates should be legal inline ones). The only possibly tricky case to maybe worry about is if when legalizing operands in moveToVALU with s_add_i32 and similar instructions. If the original s_add_i32 had a literal constant and we need to replace it with v_add_i32_e64 we would have an unsupported literal operand. However, I don't think we should worry about that because SIFoldOperands should handle folding literal constant operands into the SALU instructions based on the uses. At SIFoldOperands time, the legality and profitability of operand types is a bit different. llvm-svn: 250951
* AMDGPU: Fix not checking implicit operands in verifyInstructionMatt Arsenault2015-10-211-15/+29
| | | | | | | When verifying constant bus restrictions, this wasn't catching uses in implicit operands. llvm-svn: 250948
* AMDGPU: Add MachineInstr overloads for instruction format testsMatt Arsenault2015-10-201-30/+26
| | | | llvm-svn: 250797
* AMDGPU: Use explicit register size indirect pseudosMatt Arsenault2015-10-071-1/+1
| | | | | | | | | | | | | | | | | This stops using an unknown reg class operand. Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class and an SGPR one if it sees an SGPR use. With the source operand has an explicit VGPR class, illegal copies will be inserted that SIFixSGPRCopies will take care of normally later, which will allow removing the weird check of build_vector users. Without this, when removed v_movrels_b32 would still be emitted even though all of the values were only stored in SGPRs. llvm-svn: 249494
* AMDGPU/SI: Add verifier check for exec readsMatt Arsenault2015-10-021-0/+10
| | | | | | | Make sure we aren't accidentally not setting these in the instruction definitions. llvm-svn: 249170
* AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is setMarek Olsak2015-09-291-0/+13
| | | | | | | | | | to prevent setting a huge stride, because DATA_FORMAT has a different meaning if ADD_TID_ENABLE is set. This is a candidate for stable llvm 3.7. Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 248858
* AMDGPU: Factor switch into separate functionMatt Arsenault2015-09-281-21/+27
| | | | llvm-svn: 248742
* AMDGPU: Fix splitting x16 SMRD loadsMatt Arsenault2015-09-281-2/+2
| | | | | | | | When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741
* AMDGPU: Fix moving SMRD loads with literal offsets on CIMatt Arsenault2015-09-281-3/+9
| | | | llvm-svn: 248740
* AMDGPU: Fix splitting SMRD with large offsetMatt Arsenault2015-09-281-1/+1
| | | | | | | | | | | | | The splitting of > 4 dword SMRD instructions if using an offset in an SGPR instead of an immediate was not setting the destination register, resulting an an instruction missing an operand which would assert later. Test will be included in a following commit which fixes a related issue. llvm-svn: 248739
* Improved the interface of methods commuting operands, improved X86-FMA3 ↵Andrew Kaylor2015-09-281-16/+37
| | | | | | | | | | mem-folding&coalescing. Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com) Differential Revision: http://reviews.llvm.org/D11370 llvm-svn: 248735
* AMDGPU: Construct new buffer instruction when moving SMRDMatt Arsenault2015-09-251-30/+37
| | | | | | | | | It's easier to understand creating a full instruction than the current situation where sometimes a new instruction is created and sometimes it is awkwardly mutated in place. llvm-svn: 248627
* AMDGPU: Re-justify workaround and fix worked around problemMatt Arsenault2015-09-251-18/+42
| | | | | | | | | | | | | | | When buffer resource descriptors were built, the upper two components of the descriptor were first composed into a 64-bit register because legalizeOperands assumed all operands had the same register class. Fix that problem, but keep the workaround. I'm not sure anything actually is actually emitting such a REG_SEQUENCE now. If multiple resource descriptors are set up with different base pointers, this is copied with a single s_mov_b64. We probably should fix this better by recognizing a pair of s_mov_b32 later, but for now delete the dead code. llvm-svn: 248585
OpenPOWER on IntegriCloud