summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Relax SGPR asm constraint register classMatt Arsenault2016-08-301-0/+10
| | | | | | | s should be SReg_32 to be as general as possible. This can avoid a copy from m0. llvm-svn: 280154
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-2923-62/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
* AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the schedulerTom Stellard2016-08-2910-39/+61
| | | | | | | | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991
* AMDGPU/R600: Fix fixups used for constant arraysMatt Arsenault2016-08-291-0/+28
| | | | | | Fixes bug 29289 llvm-svn: 279986
* AMDGPU/SI: Improve register allocation hints for sopk instructionsTom Stellard2016-08-291-2/+2
| | | | | | | | | | | | | | | | | | | Summary: For shrinking SOPK instructions, we were creating a hint to tell the register allocator to use the register allocated for src0 for the dst operand as well. However, this seems to not work sometimes depending on the order virtual registers are assigned physical registers. To fix this, I've added a second allocation hint which does the reverse, asks that the register allocated for dst is used for src0. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23862 llvm-svn: 279968
* AMDGPU/R600: Enable Load combineJan Vesely2016-08-276-135/+1951
| | | | | | | | Fix and improve tests Differential Revision: https://reviews.llvm.org/D23899 llvm-svn: 279925
* AMDGPU: Select mulhi 24-bit instructionsMatt Arsenault2016-08-272-37/+293
| | | | llvm-svn: 279902
* AMDGPU: Move cndmask pseudo to be isel pseudoMatt Arsenault2016-08-272-18/+18
| | | | | | | | There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901
* AMDGPU/SI: Canonicalize offset order for merged DS instructionsTom Stellard2016-08-266-22/+22
| | | | | | | | | | | | | | | | | | | Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets. Also, we will want to do this if we move the load/store optimizer before the scheduler. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23776 llvm-svn: 279870
* Replace subregister uses when processing tied operandsMatt Arsenault2016-08-262-10/+34
| | | | | | | | | | | | | | | | | | | | | This was for some reason skipping operands that are subregisters instead of keeping the same subregister index. v_movreld_b32 expects src0 to be the subregister of the tied super register use/def. e.g. v_movreld_b32 v0, v9, <imp-def, tied3> v[0:3], <imp-use, tied2> v[0:3] was being replaced with v[4:7] = copy v[0:3] v_movreld_b32 v0, v9, <imp-def, tied3> v[4:7], <imp-use, tied2> v[4:7], which really writes to v[0:3] llvm-svn: 279804
* AMDGCN/SI: Implement readlane/readfirstlane intrinsicsChangpeng Fang2016-08-242-0/+78
| | | | | | | | | | | | | | | Summary: This patch implements readlane/readfirstlane intrinsics. TODO: need to define a new register class to consider the case that the source could be a vector register or M0. Reviewed by: arsenm and tstellarAMD Differential Revision: http://reviews.llvm.org/D22489 llvm-svn: 279660
* AMDGPU : Add V_SAD_U32 instruction pattern.Wei Ding2016-08-241-0/+283
| | | | | | Differential Revision: http://reviews.llvm.org/D23069 llvm-svn: 279629
* Create subranges for new intervals resulting from live interval splittingKrzysztof Parzyszek2016-08-241-0/+51
| | | | | | | | | | | | | | | | | | | The register allocator can split a live interval of a register into a set of smaller intervals. After the allocation of registers is complete, the rewriter will modify the IR to replace virtual registers with the corres- ponding physical registers. At this stage, if a register corresponding to a subregister of a virtual register is used, the rewriter will check if that subregister is undefined, and if so, it will add the <undef> flag to the machine operand. The function verifying liveness of the subregis- ter would assume that it is undefined, unless any of the subranges of the live interval proves otherwise. The problem is that the live intervals created during splitting do not have any subranges, even if the original parent interval did. This could result in the <undef> flag placed on a register that is actually defined. Differential Revision: http://reviews.llvm.org/D21189 llvm-svn: 279625
* MIRParser/MIRPrinter: Compute isSSA instead of printing/parsing it.Matthias Braun2016-08-241-9/+0
| | | | | | | | | Specifying isSSA is an extra line at best and results in invalid MI at worst. Compute the value instead. Differential Revision: http://reviews.llvm.org/D22722 llvm-svn: 279600
* AMDGPU: Split SILowerControlFlow into two piecesMatt Arsenault2016-08-222-12/+37
| | | | | | | | | | | | | | Do most of the lowering in a pre-RA pass. Keep the skip jump insertion late, plus a few other things that require more work to move out. One concern I have is now there may be COPY instructions which do not have the necessary implicit exec uses if they will be lowered to v_mov_b32. This has a positive effect on SGPR usage in shader-db. llvm-svn: 279464
* Revert "RegScavenging: Add scavengeRegisterBackwards()"Matthias Braun2016-08-191-2/+2
| | | | | | | | | | | The ppc64 multistage bot fails on this. This reverts commit r279124. Also Revert "CodeGen: Add/Factor out LiveRegUnits class; NFCI" because it depends on the previous change This reverts commit r279171. llvm-svn: 279199
* AMDGPU/SI: Fix a test in wqm.ll to always use s_cbranch_vcc*Tom Stellard2016-08-181-7/+7
| | | | | | | | | | | | | | | Summary: We need to use floating-point compares to ensure that s_cbranch_vcc* instructions are always generated. With integer compares, future optimizations could cause s_cbranch_scc* to be generated instead. Reviewers: arsenm, nhaehnle Subscribers: llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23401 llvm-svn: 279148
* AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.Wei Ding2016-08-182-18/+18
| | | | | | Differential Revision: http://reviews.llvm.org/D23689 llvm-svn: 279126
* RegScavenging: Add scavengeRegisterBackwards()Matthias Braun2016-08-181-2/+2
| | | | | | | | | | | | | | | | Re-apply r276044 with off-by-1 instruction fix for the reload placement. This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 279124
* [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.Valery Pykhtin2016-08-182-0/+86
| | | | | | Differential revision: https://reviews.llvm.org/D23666 llvm-svn: 279106
* AMDGPU/R600: Convert buffer id to VTX_READ inputJan Vesely2016-08-151-15/+36
| | | | | | | | | Use patterns instead of multiple instructions Add buffer id to asm string https://reviews.llvm.org/D22650 llvm-svn: 278749
* AMDGPU: Don't fold subregister extracts into tied operandsMatt Arsenault2016-08-151-0/+15
| | | | llvm-svn: 278676
* [LSR] Don't try and create post-inc expressions on non-rotated loopsJames Molloy2016-08-151-2/+3
| | | | | | | | | | | | | | | If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs! Motivating testcase: void f(float *a, float *b, float *c, int n) { while (n-- > 0) *c++ = *a++ + *b++; } It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage. llvm-svn: 278658
* Revert "Revert "Invariant start/end intrinsics overloaded for address space""Mehdi Amini2016-08-131-4/+4
| | | | | | This reverts commit 32fc6488e48eafc0ca1bac1bd9cbf0008224d530. llvm-svn: 278609
* Revert "Invariant start/end intrinsics overloaded for address space"Mehdi Amini2016-08-131-4/+4
| | | | | | This reverts commit r276447. llvm-svn: 278608
* AMDGPU: Fix missing test for addressing mode with odd offsetsMatt Arsenault2016-08-131-5/+31
| | | | | | Add test if the constant offset looks unaligned. llvm-svn: 278589
* AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32Wei Ding2016-08-111-0/+60
| | | | | | Differential Revision: http://reviews.llvm.org/D23336 llvm-svn: 278403
* AMDGPU: Fix crashes on memory functionsMatt Arsenault2016-08-111-0/+54
| | | | llvm-svn: 278369
* AMDGPU : Fix SAD related instruction LIT tests function atttibute issues.Wei Ding2016-08-117-21/+14
| | | | | | Differential Revision: http://reviews.llvm.org/D23133 llvm-svn: 278360
* AMDGPU : Add LLVM intrinsics for SAD related instructions.Wei Ding2016-08-117-0/+161
| | | | | | Differential Revision: http://reviews.llvm.org/D23133 llvm-svn: 278354
* AMDGPU/SI: Implement amdgcn image intrinsics with samplerChangpeng Fang2016-08-104-0/+817
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch define and implement amdgcn image intrinsics with sampler. 1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name to have three suffixes to overload each of these three types; 2. D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as resource type, then r128 is 0! 3. don't expose TFE flag, and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da. Differential Revision: http://reviews.llvm.org/D22838 Reviewed by: arsenm and tstellarAMD llvm-svn: 278291
* AMDGPU: Change insertion point of si_mask_branchMatt Arsenault2016-08-105-17/+30
| | | | | | | | | | | | | Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273
* LiveIntervalAnalysis: fix a crash in repairOldRegInRangeNicolai Haehnle2016-08-101-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: See the new test case for one that was (non-deterministically) crashing on trunk and deterministically hit the assertion that I added in D23302. Basically, the machine function contains a sequence DS_WRITE_B32 %vreg4, %vreg14:sub0, ... DS_WRITE_B32 %vreg4, %vreg14:sub0, ... %vreg14:sub1<def> = COPY %vreg14:sub0 and SILoadStoreOptimizer::mergeWrite2Pair merges the two DS_WRITE_B32 instructions into one before calling repairIntervalsInRange. Now repairIntervalsInRange wants to repair %vreg14, in particular, and ends up trying to repair %vreg14:sub1 as well, but that only becomes active _after_ the range that is to be repaired, hence the crash due to LR.find(...) == LR.begin() at the start of repairOldRegInRange. I believe that just skipping those subrange is fine, but again, not too familiar with that code. Reviewers: MatzeB, kparzysz, tstellarAMD Subscribers: llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D23303 llvm-svn: 278268
* AMDGPU/SI: Increase SGPR limit to 96 on Tonga/IcelandMarek Olsak2016-08-052-34/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the setting of the Vulkan closed source driver. It decreases the max wave count from 10 to 8. 26010 shaders in 14650 tests Totals: VGPRS: 829593 -> 808440 (-2.55 %) Spilled SGPRs: 81878 -> 42226 (-48.43 %) Spilled VGPRs: 367 -> 358 (-2.45 %) Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread Code Size: 36677864 -> 35923932 (-2.06 %) bytes There is a massive decrease in SGPR spilling in general and -7.4% spilled VGPRs for DiRT Showdown (= SGPRs spilled to scratch?) Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23034 llvm-svn: 277867
* [OpenCL] Add missing tests for getOCLTypeNameYaxun Liu2016-08-041-0/+217
| | | | | | | | | | Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown. Patch by Aaron En Ye Shi. Differential Revision: https://reviews.llvm.org/D22964 llvm-svn: 277759
* AMDGPU: Fix a slow test by using basic regallocMatt Arsenault2016-08-041-1/+1
| | | | | | | | | | This just tests that the register limit isn't exceeded, so the regisetr allocation doesn't need to be great.' The critically slow part is all in greedy RA, so switch to basic. llvm-svn: 277700
* RenameIndependentSubregs: Fix liveness query in rewriteOperands()Matthias Braun2016-08-031-2/+42
| | | | | | | | rewriteOperands() always performed liveness queries at the base index rather than the RegSlot/Base as apropriate for the machine operand. This could lead to illegal rewriting in some cases. llvm-svn: 277661
* AMDGPU: fdiv -1, x -> rcp -xMatt Arsenault2016-08-022-1/+98
| | | | llvm-svn: 277535
* AMDGPU: Stay in WQM for non-intrinsic storesNicolai Haehnle2016-08-022-44/+74
| | | | | | | | | | | | | | | | | | | | | | | Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504
* AMDGPU: Track physical registers in SIWholeQuadModeNicolai Haehnle2016-08-022-1/+38
| | | | | | | | | | | | | | | | | | | | Summary: There are cases where uniform branch conditions are computed in VGPRs, and we didn't correctly mark those as WQM. The stray change in basic-branch.ll is because invoking the LiveIntervals analysis leads to the detection of a dead register that would otherwise not be seen at -O0. This is a candidate for the 3.9 branch, as it fixes a possible hang. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22673 llvm-svn: 277500
* AMDGPU: Fix shouldConvertConstantLoadToIntImm behaviorMatt Arsenault2016-07-301-1/+46
| | | | | | | This should really be true for any immediate, not just inline ones. llvm-svn: 277260
* AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.Changpeng Fang2016-07-281-0/+71
| | | | | | | | Differential Revision: http://reviews.llvm.org/D22021 Reviewed by: arsenm llvm-svn: 277073
* AMDGPU : Add intrinsics for compare with the full wavefront resultWei Ding2016-07-282-0/+400
| | | | | | Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
* AMDGPU: add execfix flag to SI_ELSENicolai Haehnle2016-07-281-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972
* AMDGPU: Use rcp for fdiv 1, x with fpmath metadataMatt Arsenault2016-07-263-18/+94
| | | | | | | Using rcp should be OK for safe math usually, so this should not be replacing the original fdiv. llvm-svn: 276823
* AMDGPU: Add more tests for LDS size with occupancyMatt Arsenault2016-07-261-2/+149
| | | | llvm-svn: 276821
* MIRParser: Use dot instead of colon to mark subregistersMatthias Braun2016-07-263-85/+85
| | | | | | | | | | | | | | | | | Change the syntax to use `%0.sub8` to denote a subregister. This seems like a more natural fit to denote subregisters; I also plan to introduce a new ":classname" syntax in upcoming patches to denote the register class of a vreg. Note that this commit disallows plain identifiers to start with a '.' character. This shouldn't affect anything as external names/IR references are all prefixed with '$'/'%', plain identifiers are only used for instruction names, register mask names and subreg indexes. Differential Revision: https://reviews.llvm.org/D22390 llvm-svn: 276815
* GlobalISel: omit braces on MachineInstr types when there's only one.Tim Northover2016-07-261-1/+1
| | | | | | Tidies up the representation a bit in the common case. llvm-svn: 276772
* AMDGPU: Add missing tests for xnack option for HSAMatt Arsenault2016-07-261-5/+21
| | | | llvm-svn: 276765
* AMDGPU: Add fp legacy instruction intrinsicsMatt Arsenault2016-07-262-0/+96
| | | | | | | This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
OpenPOWER on IntegriCloud