bcm5719-llvm/llvm/test/Transforms/InterleavedAccess, branch meklort-10.0.1

bcm5719-llvm/llvm/test/Transforms/InterleavedAccess, branch meklort-10.0.1 Project Ortega BCM5719 LLVM https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1 2019-12-08T10:37:29+00:00 [ARM] Disable VLD4 under MVE 2019-12-08T10:37:29+00:00 David Green david.green@arm.com 2019-12-08T09:58:03+00:00 urn:sha1:3a6eb5f16054e8c0f41a37542a5fc806016502a0 Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109 [ARM] MVE interleaving load and stores. 2019-11-19T18:37:30+00:00 David Green david.green@arm.com 2019-11-19T18:37:21+00:00 urn:sha1:882f23caeae5ad3ec1806eb6ec387e3611649d54 Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392 [ARM] Add and update a lot of VLDn tests. NFC 2019-11-19T18:37:30+00:00 David Green david.green@arm.com 2019-11-19T18:17:46+00:00 urn:sha1:411bfe476b758c09a0c9d4b3176e46f0a70de3bb Revert "Temporarily Revert "Add basic loop fusion pass."" 2019-04-17T04:52:47+00:00 Eric Christopher echristo@gmail.com 2019-04-17T04:52:47+00:00 urn:sha1:cee313d288a4faf0355d76fb6e0e927e211d08a5 The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552 Temporarily Revert "Add basic loop fusion pass." 2019-04-17T02:12:23+00:00 Eric Christopher echristo@gmail.com 2019-04-17T02:12:23+00:00 urn:sha1:a86343512845c9c1fdbac865fea88aa5fce7142a As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546 [InterleavedAccessPass] Don't increase the number of bytes loaded. 2019-03-28T20:44:50+00:00 Eli Friedman efriedma@quicinc.com 2019-03-28T20:44:50+00:00 urn:sha1:96f295e23bed5b717313f41fb71d81e8f1d49090 Even if the interleaving transform would otherwise be legal, we shouldn't introduce an interleaved load that is wider than the original load: it might have undefined behavior. It might be possible to perform some sort of mask-narrowing transform in some cases (using a narrower interleaved load, then extending the results using shufflevectors). But I haven't tried to implement that, at least for now. Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 . Differential Revision: https://reviews.llvm.org/D59954 llvm-svn: 357212 [X86][LLVM]Expanding Supports lowerInterleaved{store|load}() in X86InterleavedAccess (VF64 stride 3-4) 2017-10-02T07:35:25+00:00 Michael Zuckerman Michael.zuckerman@intel.com 2017-10-02T07:35:25+00:00 urn:sha1:e4084f6bdbd338fd00c7c888f966f0b762f678af I continue to support different VF interleaved and in this pass for this patch, I added the vf64 stride3 support for both load and store. I also added support fot the stride4 store. Reviewers: 1. zvi 2. dorit 3. igorb 4. guyblank Differential Revision: https://reviews.llvm.org/D37687 Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b llvm-svn: 314651 Adding test for interleved, case stride 4 vf64 store<NFC>. 2017-10-01T09:37:38+00:00 Michael Zuckerman Michael.zuckerman@intel.com 2017-10-01T09:37:38+00:00 urn:sha1:17468954902a117a935b03e8dd2147fd6d2a8962 Change-Id: I9ea62aac81b763c83d26613dca6fcd846997a017 llvm-svn: 314621 Code refactoring for the interleaved code <NFC> 2017-09-30T14:55:03+00:00 Michael Zuckerman Michael.zuckerman@intel.com 2017-09-30T14:55:03+00:00 urn:sha1:b92b6d424fe143d3985e87708817a66fb2927795 Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1 llvm-svn: 314596 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF{8|16|32} stride 3) 2017-09-26T18:49:11+00:00 Michael Zuckerman Michael.zuckerman@intel.com 2017-09-26T18:49:11+00:00 urn:sha1:645f777e40c367e5a73acfc400677250a4661b32 This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) . This patch is part two of two patches and it covers the store (interlevaed) side. The patch goal is to optimize the following sequence: a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 into a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 Reviewers: zvi guyblank dorit Ayal Differential Revision: https://reviews.llvm.org/D37117 Change-Id: I56ced8bcbea809a37654060771911ade20246ccc llvm-svn: 314234