bcm5719-llvm/llvm/test/CodeGen/AMDGPU, branch meklort-10.0.1

bcm5719-llvm/llvm/test/CodeGen/AMDGPU, branch meklort-10.0.1 Project Ortega BCM5719 LLVM https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1 2020-02-10T13:23:15+00:00 AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x) 2020-02-10T13:23:15+00:00 Jan Vesely jan.vesely@rutgers.edu 2020-02-05T00:27:19+00:00 urn:sha1:b73942dbc144c11dc94fd32a7d8025a22e7e1d6b The old version might be faster on EG (RECIP_IEEE is Trans only), but it'd need extra corner case checks. This gives correct corner case behaviour and saves a register. Fixes OCL CTS sqrt test (1-thread, scalar) on Turks. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D74017 (cherry picked from commit e6686adf8a743564f0c455c34f04752ab08cf642) AMDGPU: Fix handling of infinite loops in fragment shaders 2020-02-04T10:38:00+00:00 Connor Abbott cwabbott0@gmail.com 2019-11-27T13:09:13+00:00 urn:sha1:5f6fec2404c5135247ae9e4e515e8d9d3242f790 Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781 (cherry picked from commit 87d98c149504f9b0751189744472d7cc94883960) R600: Fix failing testcase 2020-02-03T15:05:43+00:00 Matt Arsenault Matthew.Arsenault@amd.com 2020-01-22T21:01:15+00:00 urn:sha1:fa51929f03f541de48e7eaf4a06a27166db3580c (cherry picked from commit 7dc49f77ee508b4152f9291c8e804e4eda3653d3) AMDGPU/R600: Emit rodata in text segment 2020-02-03T15:05:42+00:00 Jan Vesely jan.vesely@rutgers.edu 2020-01-19T05:29:30+00:00 urn:sha1:5cca13d43b7e972d0de6301cfed30781251489a1 R600 relies on this behaviour. Fixes: 6e18266aa4dd78953557b8614cb9ff260bad7c65 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"') Fixes ~100 piglit regressions since 6e18266 Differential Revision: https://reviews.llvm.org/D72991 (cherry picked from commit 1b8eab179db46f25a267bb73c657009c0bb542cc) Revert "[AMDGPU] Invert the handling of skip insertion." 2020-02-03T15:00:00+00:00 Nicolai Hähnle nicolai.haehnle@amd.com 2020-01-21T08:17:25+00:00 urn:sha1:94c79ce5740f69aa9a9f5145c9911a61b7d20662 This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa. (cherry picked from commit a80291ce10ba9667352adcc895f9668144f5f616) [AMDGPU] Invert the handling of skip insertion. 2020-01-15T09:48:16+00:00 cdevadas cdevadas@amd.com 2020-01-10T16:53:27+00:00 urn:sha1:0dc6c249bffac9f23a605ce4e42a84341da3ddbd The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092 [amdgpu] Fix typos in a test case. 2020-01-15T01:08:39+00:00 Michael Liao michael.hliao@gmail.com 2020-01-15T01:08:39+00:00 urn:sha1:65c8abb14e77b28d8357c52dddb8e0a6b12b4ba2 - There are typos introduced due to merge. [codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU. 2020-01-15T00:26:15+00:00 Michael Liao michael.hliao@gmail.com 2020-01-08T15:50:23+00:00 urn:sha1:01a4b83154760ea286117ac4de9576b8a215cb8d Summary: - `dead-mi-elimination` assumes MIR in the SSA form and cannot be arranged after phi elimination or DeSSA. It's enhanced to handle the dead register definition by skipping use check on it. Once a register def is `dead`, all its uses, if any, should be `undef`. - Re-arrange the DIE in RA phase for AMDGPU by placing it directly after `detect-dead-lanes`. - Many relevant tests are refined due to different register assignment. Reviewers: rampitec, qcolombet, sunfish Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72709 [DAGCombine] Replace `getIntPtrConstant()` with `getVectorIdxTy()`. 2020-01-14T22:03:05+00:00 Michael Liao michael.hliao@gmail.com 2020-01-14T21:30:52+00:00 urn:sha1:8d07f8d98c48ee0a9dca450aaf4e1cabc621ff68 - Prefer `getVectorIdxTy()` as the index operand type for `EXTRACT_SUBVECTOR` as targets expect different types by overloading `getVectorIdxTy()`. [MachineScheduler] Reduce reordering due to mem op clustering 2020-01-14T19:19:02+00:00 Jay Foad jay.foad@amd.com 2020-01-14T15:40:52+00:00 urn:sha1:b777e551f044bbc7245a0b535e46000469479ff6 Summary: Mem op clustering adds a weak edge in the DAG between two loads or stores that should be clustered, but the direction of this edge is pretty arbitrary (it depends on the sort order of MemOpInfo, which represents the operands of a load or store). This often means that two loads or stores will get reordered even if they would naturally have been scheduled together anyway, which leads to test case churn and goes against the scheduler's "do no harm" philosophy. The fix makes sure that the direction of the edge always matches the original code order of the instructions. Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72706