<feed xmlns='http://www.w3.org/2005/Atom'>
<title>bcm5719-llvm/llvm/lib/Target/AMDGPU, branch meklort-10.0.1</title>
<subtitle>Project Ortega BCM5719 LLVM</subtitle>
<id>https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1</id>
<link rel='self' href='https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/'/>
<updated>2020-02-19T13:24:33+00:00</updated>
<entry>
<title>Fix unused function warning (PR44808)</title>
<updated>2020-02-19T13:24:33+00:00</updated>
<author>
<name>Hans Wennborg</name>
<email>hans@chromium.org</email>
</author>
<published>2020-02-12T14:12:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=5f76fcc9796e1a68f44a79b7910a199c0db9fe82'/>
<id>urn:sha1:5f76fcc9796e1a68f44a79b7910a199c0db9fe82</id>
<content type='text'>
(cherry picked from commit a19de32095e4cdb18957e66609574ce2021a8d1c)
</content>
</entry>
<entry>
<title>AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x)</title>
<updated>2020-02-10T13:23:15+00:00</updated>
<author>
<name>Jan Vesely</name>
<email>jan.vesely@rutgers.edu</email>
</author>
<published>2020-02-05T00:27:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=b73942dbc144c11dc94fd32a7d8025a22e7e1d6b'/>
<id>urn:sha1:b73942dbc144c11dc94fd32a7d8025a22e7e1d6b</id>
<content type='text'>
The old version might be faster on EG (RECIP_IEEE is Trans only),
but it'd need extra corner case checks.
This gives correct corner case behaviour and saves a register.
Fixes OCL CTS sqrt test (1-thread, scalar) on Turks.

Reviewer: arsenm
Differential Revision: https://reviews.llvm.org/D74017

(cherry picked from commit e6686adf8a743564f0c455c34f04752ab08cf642)
</content>
</entry>
<entry>
<title>AMDGPU: Fix handling of infinite loops in fragment shaders</title>
<updated>2020-02-04T10:38:00+00:00</updated>
<author>
<name>Connor Abbott</name>
<email>cwabbott0@gmail.com</email>
</author>
<published>2019-11-27T13:09:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=5f6fec2404c5135247ae9e4e515e8d9d3242f790'/>
<id>urn:sha1:5f6fec2404c5135247ae9e4e515e8d9d3242f790</id>
<content type='text'>
Summary:
Due to the fact that kill is just a normal intrinsic, even though it's
supposed to terminate the thread, we can end up with provably infinite
loops that are actually supposed to end successfully. The
AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because
there's no obvious place to make the loop branch to, it just makes it
return immediately, which skips the exports that are supposed to happen
at the end and hangs the GPU if all the threads end up being killed.

While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess if we
did that when the kill is inside control flow. For now, we just add a null
export at the end to make sure that it always exports something, which fixes
the immediate problem without penalizing the more common case. This means that
we sometimes do two "done" exports when only some of the threads enter the
discard loop, but from tests the hardware seems ok with that.

This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70781

(cherry picked from commit 87d98c149504f9b0751189744472d7cc94883960)
</content>
</entry>
<entry>
<title>AMDGPU/R600: Emit rodata in text segment</title>
<updated>2020-02-03T15:05:42+00:00</updated>
<author>
<name>Jan Vesely</name>
<email>jan.vesely@rutgers.edu</email>
</author>
<published>2020-01-19T05:29:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=5cca13d43b7e972d0de6301cfed30781251489a1'/>
<id>urn:sha1:5cca13d43b7e972d0de6301cfed30781251489a1</id>
<content type='text'>
R600 relies on this behaviour.
Fixes: 6e18266aa4dd78953557b8614cb9ff260bad7c65 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"')
Fixes ~100 piglit regressions since 6e18266

Differential Revision: https://reviews.llvm.org/D72991

(cherry picked from commit 1b8eab179db46f25a267bb73c657009c0bb542cc)
</content>
</entry>
<entry>
<title>Revert "[AMDGPU] Invert the handling of skip insertion."</title>
<updated>2020-02-03T15:00:00+00:00</updated>
<author>
<name>Nicolai Hähnle</name>
<email>nicolai.haehnle@amd.com</email>
</author>
<published>2020-01-21T08:17:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=94c79ce5740f69aa9a9f5145c9911a61b7d20662'/>
<id>urn:sha1:94c79ce5740f69aa9a9f5145c9911a61b7d20662</id>
<content type='text'>
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.

The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for
Mesa.

(cherry picked from commit a80291ce10ba9667352adcc895f9668144f5f616)
</content>
</entry>
<entry>
<title>[AMDGPU] Invert the handling of skip insertion.</title>
<updated>2020-01-15T09:48:16+00:00</updated>
<author>
<name>cdevadas</name>
<email>cdevadas@amd.com</email>
</author>
<published>2020-01-10T16:53:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=0dc6c249bffac9f23a605ce4e42a84341da3ddbd'/>
<id>urn:sha1:0dc6c249bffac9f23a605ce4e42a84341da3ddbd</id>
<content type='text'>
The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.

This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.

Differential revision: https://reviews.llvm.org/D68092
</content>
</entry>
<entry>
<title>CMake: Make most target symbols hidden by default</title>
<updated>2020-01-15T03:46:52+00:00</updated>
<author>
<name>Tom Stellard</name>
<email>tstellar@redhat.com</email>
</author>
<published>2020-01-15T03:15:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=0dbcb3639451a7c20e2d5133b459552281e64455'/>
<id>urn:sha1:0dbcb3639451a7c20e2d5133b459552281e64455</id>
<content type='text'>
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.

A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.

This patch reduces the number of public symbols in libLLVM.so by about
25%.  This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so

One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.

Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278

Reviewers: chandlerc, beanz, mgorny, rnk, hans

Reviewed By: rnk, hans

Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54439
</content>
</entry>
<entry>
<title>[codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU.</title>
<updated>2020-01-15T00:26:15+00:00</updated>
<author>
<name>Michael Liao</name>
<email>michael.hliao@gmail.com</email>
</author>
<published>2020-01-08T15:50:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=01a4b83154760ea286117ac4de9576b8a215cb8d'/>
<id>urn:sha1:01a4b83154760ea286117ac4de9576b8a215cb8d</id>
<content type='text'>
Summary:
- `dead-mi-elimination` assumes MIR in the SSA form and cannot be
  arranged after phi elimination or DeSSA. It's enhanced to handle the
  dead register definition by skipping use check on it. Once a register
  def is `dead`, all its uses, if any, should be `undef`.
- Re-arrange the DIE in RA phase for AMDGPU by placing it directly after
  `detect-dead-lanes`.
- Many relevant tests are refined due to different register assignment.

Reviewers: rampitec, qcolombet, sunfish

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72709
</content>
</entry>
<entry>
<title>[AMDGPU] Model distance to instruction in bundle</title>
<updated>2020-01-14T09:18:59+00:00</updated>
<author>
<name>Stanislav Mekhanoshin</name>
<email>Stanislav.Mekhanoshin@amd.com</email>
</author>
<published>2020-01-14T01:01:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=ad741853c38880dff99cd5b5035b8965c5a73011'/>
<id>urn:sha1:ad741853c38880dff99cd5b5035b8965c5a73011</id>
<content type='text'>
This change allows to model the height of the instruction
within a bundle for latency adjustment purposes.

Differential Revision: https://reviews.llvm.org/D72669
</content>
</entry>
<entry>
<title>[AMDGPU] Fix getInstrLatency() always returning 1</title>
<updated>2020-01-14T09:08:30+00:00</updated>
<author>
<name>Stanislav Mekhanoshin</name>
<email>Stanislav.Mekhanoshin@amd.com</email>
</author>
<published>2020-01-13T22:30:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=eca44745871bc46728903aaa262abc6344d4f959'/>
<id>urn:sha1:eca44745871bc46728903aaa262abc6344d4f959</id>
<content type='text'>
We do not have InstrItinerary so generic getInstLatency() was always
defaulting to return 1 cycle. We need to use TargetSchedModel instead
to compute an instruction's latency.

Differential Revision: https://reviews.llvm.org/D72655
</content>
</entry>
</feed>
