<feed xmlns='http://www.w3.org/2005/Atom'>
<title>bcm5719-llvm/llvm/test/Transforms/LoopVectorize/X86, branch meklort-10.0.1</title>
<subtitle>Project Ortega BCM5719 LLVM</subtitle>
<id>https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1</id>
<link rel='self' href='https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/'/>
<updated>2020-01-07T15:10:25+00:00</updated>
<entry>
<title>llc: Change behavior of -mcpu with existing attribute</title>
<updated>2020-01-07T15:10:25+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2019-12-09T11:37:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=f26ed6e47cb8b080c236d11c4942a12265180084'/>
<id>urn:sha1:f26ed6e47cb8b080c236d11c4942a12265180084</id>
<content type='text'>
Don't overwrite existing target-cpu attributes.

I've often found the replacement behavior annoying, and this is
inconsistent with how the fast math command line flags interact with
the function attributes.

Does not yet change target-features, since I think that should behave
as a concatenation.
</content>
</entry>
<entry>
<title>Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351</title>
<updated>2019-12-25T00:27:51+00:00</updated>
<author>
<name>Fangrui Song</name>
<email>maskray@google.com</email>
</author>
<published>2019-12-25T00:11:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=a36ddf0aa9db5c1086e04f56b5f077b761712eb5'/>
<id>urn:sha1:a36ddf0aa9db5c1086e04f56b5f077b761712eb5</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351</title>
<updated>2019-12-24T23:57:33+00:00</updated>
<author>
<name>Fangrui Song</name>
<email>maskray@google.com</email>
</author>
<published>2019-12-24T23:52:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=502a77f125f43ffde57af34d3fd1b900248a91cd'/>
<id>urn:sha1:502a77f125f43ffde57af34d3fd1b900248a91cd</id>
<content type='text'>
</content>
</entry>
<entry>
<title>[LV] Strip wrap flags from vectorized reductions</title>
<updated>2019-12-20T12:48:53+00:00</updated>
<author>
<name>Ayal Zaks</name>
<email>ayal.zaks@intel.com</email>
</author>
<published>2019-12-19T22:04:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=e498be573871c94119033dd151773a55ceb0beb7'/>
<id>urn:sha1:e498be573871c94119033dd151773a55ceb0beb7</id>
<content type='text'>
A sequence of additions or multiplications that is known not to wrap, may wrap
if it's order is changed (i.e., reassociated). Therefore when vectorizing
integer sum or product reductions, their no-wrap flags need to be removed.

Fixes PR43828

Patch by Denis Antrushin

Differential Revision: https://reviews.llvm.org/D69563
</content>
</entry>
<entry>
<title>[LV] Scalar with predication must not be uniform</title>
<updated>2019-12-03T17:50:24+00:00</updated>
<author>
<name>Ayal Zaks</name>
<email>ayal.zaks@intel.com</email>
</author>
<published>2019-11-26T22:08:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=6ed9cef25f915d4533f261c401cee29d8d8012d5'/>
<id>urn:sha1:6ed9cef25f915d4533f261c401cee29d8d8012d5</id>
<content type='text'>
Fix PR40816: avoid considering scalar-with-predication instructions as also
uniform-after-vectorization.

Instructions identified as "scalar with predication" will be "vectorized" using
a replicating region. If such instructions are also optimized as "uniform after
vectorization", namely when only the first of VF lanes is used, such a
replicating region becomes erroneous - only the first instance of the region can
and should be formed. Fix such cases by not considering such instructions as
"uniform after vectorization".

Differential Revision: https://reviews.llvm.org/D70298
</content>
</entry>
<entry>
<title>[x86] make SLM extract vector element more expensive than default</title>
<updated>2019-11-27T19:08:56+00:00</updated>
<author>
<name>Sanjay Patel</name>
<email>spatel@rotateright.com</email>
</author>
<published>2019-11-27T18:33:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=5c166f1d1969e9c1e5b72aa672add429b9c22b53'/>
<id>urn:sha1:5c166f1d1969e9c1e5b72aa672add429b9c22b53</id>
<content type='text'>
I'm not sure what the effect of this change will be on all of the affected
tests or a larger benchmark, but it fixes the horizontal add/sub problems
noted here:
https://reviews.llvm.org/D59710?vs=227972&amp;id=228095&amp;whitespace=ignore-most#toc

The costs are based on reciprocal throughput numbers in Agner's tables for
PEXTR*; these appear to be very slow ops on Silvermont.

This is a small step towards the larger motivation discussed in PR43605:
https://bugs.llvm.org/show_bug.cgi?id=43605

Also, it seems likely that insert/extract is the source of perf regressions on
other CPUs (up to 30%) that were cited as part of the reason to revert D59710,
so maybe we'll extend the table-based approach to other subtargets.

Differential Revision: https://reviews.llvm.org/D70607
</content>
</entry>
<entry>
<title>[LV] Move interleave_short_tc.ll into the X86 directory to hopefully make fix non-X86 bots.</title>
<updated>2019-11-01T17:41:18+00:00</updated>
<author>
<name>Craig Topper</name>
<email>craig.topper@intel.com</email>
</author>
<published>2019-11-01T17:18:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=4592f70758531d6efe4e43d8122a8944f469d933'/>
<id>urn:sha1:4592f70758531d6efe4e43d8122a8944f469d933</id>
<content type='text'>
</content>
</entry>
<entry>
<title>[ConstantFold] Fold extractelement of getelementptr</title>
<updated>2019-10-28T18:32:39+00:00</updated>
<author>
<name>Jay Foad</name>
<email>jay.foad@amd.com</email>
</author>
<published>2019-10-24T12:15:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=843c0adf0f7449a4167d20b399f70f6943d21d5e'/>
<id>urn:sha1:843c0adf0f7449a4167d20b399f70f6943d21d5e</id>
<content type='text'>
Summary:
Getelementptr has vector type if any of its operands are vectors
(the scalar operands being implicitly broadcast to all vector elements).
Extractelement applied to a vector getelementptr can be folded by
applying the extractelement in turn to all of the vector operands.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69379
</content>
</entry>
<entry>
<title>[LV] Interleaving should not exceed estimated loop trip count.</title>
<updated>2019-10-28T17:58:22+00:00</updated>
<author>
<name>Craig Topper</name>
<email>craig.topper@intel.com</email>
</author>
<published>2019-10-28T17:11:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=18824d25d8aa8727d9f64f8002f2533d57627bd5'/>
<id>urn:sha1:18824d25d8aa8727d9f64f8002f2533d57627bd5</id>
<content type='text'>
Currently we may do iterleaving by more than estimated trip count
coming from the profile or computed maximum trip count. The solution is to
use "best known" trip count instead of exact one in interleaving analysis.

Patch by Evgeniy Brevnov.

Differential Revision: https://reviews.llvm.org/D67948
</content>
</entry>
<entry>
<title>recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize</title>
<updated>2019-10-12T02:53:04+00:00</updated>
<author>
<name>Zi Xuan Wu</name>
<email>wuzish@cn.ibm.com</email>
</author>
<published>2019-10-12T02:53:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=9802268ad3123b0ac71413fd5493606573b3544d'/>
<id>urn:sha1:9802268ad3123b0ac71413fd5493606573b3544d</id>
<content type='text'>
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374634
</content>
</entry>
</feed>
