bcm5719-llvm/llvm/test/Transforms/LoopVectorize/X86, branch meklort-10.0.1

bcm5719-llvm/llvm/test/Transforms/LoopVectorize/X86, branch meklort-10.0.1 Project Ortega BCM5719 LLVM https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1 2020-01-07T15:10:25+00:00 llc: Change behavior of -mcpu with existing attribute 2020-01-07T15:10:25+00:00 Matt Arsenault Matthew.Arsenault@amd.com 2019-12-09T11:37:14+00:00 urn:sha1:f26ed6e47cb8b080c236d11c4942a12265180084 Don't overwrite existing target-cpu attributes. I've often found the replacement behavior annoying, and this is inconsistent with how the fast math command line flags interact with the function attributes. Does not yet change target-features, since I think that should behave as a concatenation. Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351 2019-12-25T00:27:51+00:00 Fangrui Song maskray@google.com 2019-12-25T00:11:33+00:00 urn:sha1:a36ddf0aa9db5c1086e04f56b5f077b761712eb5 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24T23:57:33+00:00 Fangrui Song maskray@google.com 2019-12-24T23:52:21+00:00 urn:sha1:502a77f125f43ffde57af34d3fd1b900248a91cd [LV] Strip wrap flags from vectorized reductions 2019-12-20T12:48:53+00:00 Ayal Zaks ayal.zaks@intel.com 2019-12-19T22:04:49+00:00 urn:sha1:e498be573871c94119033dd151773a55ceb0beb7 A sequence of additions or multiplications that is known not to wrap, may wrap if it's order is changed (i.e., reassociated). Therefore when vectorizing integer sum or product reductions, their no-wrap flags need to be removed. Fixes PR43828 Patch by Denis Antrushin Differential Revision: https://reviews.llvm.org/D69563 [LV] Scalar with predication must not be uniform 2019-12-03T17:50:24+00:00 Ayal Zaks ayal.zaks@intel.com 2019-11-26T22:08:29+00:00 urn:sha1:6ed9cef25f915d4533f261c401cee29d8d8012d5 Fix PR40816: avoid considering scalar-with-predication instructions as also uniform-after-vectorization. Instructions identified as "scalar with predication" will be "vectorized" using a replicating region. If such instructions are also optimized as "uniform after vectorization", namely when only the first of VF lanes is used, such a replicating region becomes erroneous - only the first instance of the region can and should be formed. Fix such cases by not considering such instructions as "uniform after vectorization". Differential Revision: https://reviews.llvm.org/D70298 [x86] make SLM extract vector element more expensive than default 2019-11-27T19:08:56+00:00 Sanjay Patel spatel@rotateright.com 2019-11-27T18:33:11+00:00 urn:sha1:5c166f1d1969e9c1e5b72aa672add429b9c22b53 I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607 [LV] Move interleave_short_tc.ll into the X86 directory to hopefully make fix non-X86 bots. 2019-11-01T17:41:18+00:00 Craig Topper craig.topper@intel.com 2019-11-01T17:18:54+00:00 urn:sha1:4592f70758531d6efe4e43d8122a8944f469d933 [ConstantFold] Fold extractelement of getelementptr 2019-10-28T18:32:39+00:00 Jay Foad jay.foad@amd.com 2019-10-24T12:15:45+00:00 urn:sha1:843c0adf0f7449a4167d20b399f70f6943d21d5e Summary: Getelementptr has vector type if any of its operands are vectors (the scalar operands being implicitly broadcast to all vector elements). Extractelement applied to a vector getelementptr can be folded by applying the extractelement in turn to all of the vector operands. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69379 [LV] Interleaving should not exceed estimated loop trip count. 2019-10-28T17:58:22+00:00 Craig Topper craig.topper@intel.com 2019-10-28T17:11:20+00:00 urn:sha1:18824d25d8aa8727d9f64f8002f2533d57627bd5 Currently we may do iterleaving by more than estimated trip count coming from the profile or computed maximum trip count. The solution is to use "best known" trip count instead of exact one in interleaving analysis. Patch by Evgeniy Brevnov. Differential Revision: https://reviews.llvm.org/D67948 recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize 2019-10-12T02:53:04+00:00 Zi Xuan Wu wuzish@cn.ibm.com 2019-10-12T02:53:04+00:00 urn:sha1:9802268ad3123b0ac71413fd5493606573b3544d In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634