summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/skip-if-dead.ll
Commit message (Collapse)AuthorAgeFilesLines
* Revert "[AMDGPU] Invert the handling of skip insertion."Nicolai Hähnle2020-02-031-5/+8
| | | | | | | | | This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa. (cherry picked from commit a80291ce10ba9667352adcc895f9668144f5f616)
* [AMDGPU] Invert the handling of skip insertion.cdevadas2020-01-151-8/+5
| | | | | | | | | | | | | | | The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092
* AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructionsRhys Perry2019-04-171-0/+1
| | | | | | | | | | | | | | | | Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches. Reviewers: arsen, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60824 llvm-svn: 358592
* AMDGPU: Remove llvm.AMDGPU.killMatt Arsenault2018-12-071-30/+42
| | | | | | This is the last of the old AMDGPU intrinsics. llvm-svn: 348615
* AMDGPU: Force skip over s_sendmsg and exp instructionsNicolai Haehnle2018-07-301-1/+9
| | | | | | | | | | | | | | | | | | | | | Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235
* AMDGPU: Convert test cases to the dimension-aware intrinsicsNicolai Haehnle2018-06-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Also explicitly port over some tests in llvm.amdgcn.image.* that were missing. Some tests are removed because they no longer apply (i.e. explicitly testing building an address vector via insertelement). This is in preparation for the eventual removal of the old-style intrinsics. Some additional notes: - constant-address-space-32bit.ll: change some GCN-NEXT to GCN because the instruction schedule was subtly altered - insert_vector_elt.ll: the old test didn't actually test anything, because %tmp1 was not used; remove the load, because it doesn't work (Because of the amdgpu_ps calling convention? In any case, it's orthogonal to what the test claims to be testing.) Change-Id: Idfa99b6512ad139e755e82b8b89548ab08f0afcf Reviewers: arsenm, rampitec Subscribers: MatzeB, qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D48018 llvm-svn: 335229
* [AMDGPU] fix test to be independent of FP undefSanjay Patel2018-03-091-2/+2
| | | | llvm-svn: 327147
* [AMDGPU] Fixed incorrect uniform branch conditionTim Renouf2018-01-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I had a case where multiple nested uniform ifs resulted in code that did v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first ensuring that bits for inactive lanes were clear. There was already code for inserting an "s_and_b64 vcc, exec, vcc" to clear bits for inactive lanes in the case that the branch is instruction selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in SIFixSGPRCopies. I have added the same code into SILowerControlFlow for the case that the branch is instruction selected as s_cbranch_vccnz. This de-optimizes the code in some cases where the s_and is not needed, because vcc is the result of a v_cmp, or multiple v_cmp instructions combined by s_and/s_or. We should add a pass to re-optimize those cases. Reviewers: arsenm, kzhuravl Subscribers: wdng, yaxunl, t-tye, llvm-commits, dstuttard, timcorringham, nhaehnle Differential Revision: https://reviews.llvm.org/D41292 llvm-svn: 322119
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-24/+24
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* [AMDGPU] Eliminate no effect instructions before s_endpgmStanislav Mekhanoshin2017-08-161-1/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D36585 llvm-svn: 310987
* [AMDGPU] Preserve inverted bit in SI_IF in presence of SI_KILLStanislav Mekhanoshin2017-08-041-0/+2
| | | | | | | | | | In case if SI_KILL is in between of the SI_IF and SI_END_CF we need to preserve the bits actually flipped by if rather then restoring the original mask. Differential Revision: https://reviews.llvm.org/D36299 llvm-svn: 310031
* [AMDGPU] Optimize SI_IF lowering for simple if regionsStanislav Mekhanoshin2017-07-261-3/+0
| | | | | | | | | | | | | Currently SI_IF results in a s_and_saveexec_b64 followed by s_xor_b64. The xor is used to extract only the changed bits. In case of a simple if region where the only use of that value is in the SI_END_CF to restore the old exec mask, we can omit the xor and perform an or of the exec mask with the original exec value saved by the s_and_saveexec_b64. Differential Revision: https://reviews.llvm.org/D35861 llvm-svn: 309185
* AMDGPU: Use correct register names in inline assemblyMatt Arsenault2017-06-081-6/+6
| | | | | | Fixes using physical registers in inline asm from clang. llvm-svn: 305004
* AMDGPU: Convert image intrinsic uses in testsMatt Arsenault2017-03-211-5/+4
| | | | llvm-svn: 298386
* Revert "CodeGen: Allow small copyable blocks to "break" the CFG."Kyle Butt2017-01-111-12/+5
| | | | | | | | | This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. llvm-svn: 291695
* CodeGen: Allow small copyable blocks to "break" the CFG.Kyle Butt2017-01-101-5/+12
| | | | | | | | | | | When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well. Differential revision: https://reviews.llvm.org/D27742 llvm-svn: 291609
* AMDGPU: Select branch on undef to uniform scc branchMatt Arsenault2016-12-151-1/+1
| | | | llvm-svn: 289877
* AMDGPU: Don't required structured CFGMatt Arsenault2016-12-061-4/+2
| | | | | | | | | | | The structured CFG is just an aid to inserting exec mask modification instructions, once that is done we don't really need it anymore. We also do not analyze blocks with terminators that modify exec, so this should only be impacting true branches. llvm-svn: 288744
* AMDGPU: Change how exp is printedMatt Arsenault2016-12-051-2/+2
| | | | | | | This is an improvement over a long list of unreadable numbers. A follow up patch will try to match how sc formats these. llvm-svn: 288697
* ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps()Matthias Braun2016-11-111-0/+1
| | | | | | | | | | | | | | | addSchedBarrierDeps() is supposed to add use operands to the ExitSU node. The current implementation adds uses for calls/barrier instruction and the MBB live-outs in all other cases. The use operands of conditional jump instructions were missed. Also added code to macrofusion to set the latencies between nodes to zero to avoid problems with the fusing nodes lingering around in the pending list now. Differential Revision: https://reviews.llvm.org/D25140 llvm-svn: 286544
* AMDGPU: Remove unnecessary and on conditional branchMatt Arsenault2016-11-071-7/+3
| | | | | | | The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134
* BranchRelaxation: Support expanding unconditional branchesMatt Arsenault2016-10-061-4/+4
| | | | | | | AMDGPU needs to expand unconditional branches in a new block with an indirect branch. llvm-svn: 283464
* AMDGPU: Use unsigned compare for eq/neMatt Arsenault2016-09-301-4/+4
| | | | | | | | | | For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832
* AMDGPU: Support commuting with immediate in src0Matt Arsenault2016-09-081-1/+1
| | | | llvm-svn: 280970
* AMDGPU: Change insertion point of si_mask_branchMatt Arsenault2016-08-101-4/+7
| | | | | | | | | | | | | Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273
* AMDGPU: Stay in WQM for non-intrinsic storesNicolai Haehnle2016-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504
* AMDGPU: Fix not expanding control flow after some kill blocksMatt Arsenault2016-07-151-0/+49
| | | | | | | | | | | | | Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip should be inserted at the beginning of the block right after the end cf. Just remove this for now since no tests seem to stress this and I think this can be handled more generally later. Fixes bug 28550 llvm-svn: 275510
* AMDGPU: Fix trying to skip from a block with no successorsMatt Arsenault2016-07-151-0/+38
| | | | | | Found while reducing bug 28550 llvm-svn: 275509
* AMDGPU: Fix splitting kill blocks with defs before killMatt Arsenault2016-07-151-0/+48
| | | | llvm-svn: 275508
* AMDGPU: Fold out no-op kill intrinsicsMatt Arsenault2016-07-131-1/+0
| | | | llvm-svn: 275253
* AMDGPU: Follow up to r275203Matt Arsenault2016-07-121-6/+117
| | | | | | I meant to squash this into it. llvm-svn: 275220
* AMDGPU: Fix verifier error with kill intrinsicMatt Arsenault2016-07-121-0/+145
Don't create a terminator in the middle of the block. We should probably get rid of this intrinsic. llvm-svn: 275203
OpenPOWER on IntegriCloud