bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[CVP] Handle instructions with no user. No need to create CVPLattice state. ↵	Xin Tong	2018-09-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This handles terminator instructions and more. Summary: I tested this patch by compiling sqlite3.ll (clang -O3 -mllvm -disable-llvm-optzns sqlite3.c.) opt -called-value-propagation sqlite3.ll -time-passes -f -o out.ll I get 10+% speedup for the pass. I expect some of the gain come from skipping terminator instructions. === BEFORE THE PATCH === ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.5562 seconds (0.5582 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2485 ( 46.4%) 0.0120 ( 57.7%) 0.2605 ( 46.8%) 0.2615 ( 46.8%) Bitcode Writer 0.1607 ( 30.0%) 0.0079 ( 37.7%) 0.1685 ( 30.3%) 0.1693 ( 30.3%) Called Value Propagation 0.1262 ( 23.6%) 0.0009 ( 4.5%) 0.1271 ( 22.9%) 0.1275 ( 22.8%) Module Verifier 0.5353 (100.0%) 0.0209 (100.0%) 0.5562 (100.0%) 0.5582 (100.0%) Total === AFTER THE PATCH === ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.5338 seconds (0.5355 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2498 ( 48.6%) 0.0118 ( 59.3%) 0.2615 ( 49.0%) 0.2629 ( 49.1%) Bitcode Writer 0.1377 ( 26.8%) 0.0075 ( 37.8%) 0.1452 ( 27.2%) 0.1455 ( 27.2%) Called Value Propagation 0.1264 ( 24.6%) 0.0006 ( 3.0%) 0.1270 ( 23.8%) 0.1271 ( 23.7%) Module Verifier 0.5139 (100.0%) 0.0199 (100.0%) 0.5338 (100.0%) 0.5355 (100.0%) Total Reviewers: davide, mssimpso Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49108 llvm-svn: 342398
*	Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an ↵	Amara Emerson	2018-09-17	1	-1/+10
\| \| \| \| \| \| \| \|	extract_subvector with invalid index."" Fixed the assertion failure. llvm-svn: 342397
*	[DebugInfo] Remove redundant argument. [NFC]	Jonas Devlieghere	2018-09-17	1	-18/+17
\| \| \| \| \| \| \|	Removes the redundant UnitType parameter from verifyUnitContents. I also fixed some formatting issues as I was touching the file. llvm-svn: 342396
*	[ARM] Cleanup ARM CGP isSupportedValue	Sam Parker	2018-09-17	1	-42/+19
\| \| \| \| \| \| \| \| \| \| \| \|	isSupportedValue explicitly checked and accepted many types of value, primarily for debugging reasons. Remove most of these checks and do a bit of refactoring now that the pass is more stable. This also enables ZExts to be sources, but this has very little practical benefit at the moment extend instructions will still be introduced. Differential Revision: https://reviews.llvm.org/D52080 llvm-svn: 342395
*	[ARM] Disallow icmp with negative imm and overflow	Sam Parker	2018-09-17	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	We allow overflowing instructions if they're decreasing and only used by an unsigned compare. Add the extra condition that the icmp cannot be using a negative immediate. Differential Revision: https://reviews.llvm.org/D52102 llvm-svn: 342392
*	Fix vectorization of canonicalize	Matt Arsenault	2018-09-17	1	-0/+1
\| \| \| \|	llvm-svn: 342390
*	[GVNHoist] Re-enable GVNHoist by default	Alexandros Lamprineas	2018-09-17	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912 has been fixed by rL342055. Precommit testing performed: * Overnight runs of csmith comparing the output between programs compiled with gvn-hoist enabled/disabled. * Bootstrap builds of clang with UbSan/ASan configurations. llvm-svn: 342387
*	[PowerPC] Fix label address calculation for ppc64	Strahinja Petrovic	2018-09-17	1	-1/+2
\| \| \| \| \| \| \| \|	This patch fixes calculating address of label for non-pic ppc64. Differential Revision: https://reviews.llvm.org/D50965 llvm-svn: 342368
*	[NFC] Turn unsigned counters into boolean flags	Max Kazantsev	2018-09-17	1	-8/+13
\| \| \| \|	llvm-svn: 342360
*	[DebugInfo] Fix build when std::vector::iterator is a pointer	Kristina Brooks	2018-09-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	std::vector::iterator type may be a pointer, then iterator::value_type fails to compile since iterator is not a class, namespace, or enumeration. Patch by orivej (Orivej Desh) Differential Revision: https://reviews.llvm.org/D52142 llvm-svn: 342354
*	[X86][SSE] Always enable ISD::SRL -> ISD::MULHU for v8i16	Simon Pilgrim	2018-09-16	1	-1/+0
\| \| \| \| \| \|	For constant non-uniform cases we'll never introduce more and/andn/or selects than already occur in generic pre-SSE41 ISD::SRL lowering. llvm-svn: 342352
*	[X86][AVX] Enable ISD::SRL -> ISD::MULHU for v16i16	Simon Pilgrim	2018-09-16	1	-3/+1
\| \| \| \| \| \|	Now that rL340913 has landed with improved v16i16 selects as shuffles. llvm-svn: 342349
*	[DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)	Sanjay Patel	2018-09-16	4	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040. Copying the motivational statement by @evandro from that patch: "This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64." I'm proposing to add only the minimum support for a DAG node here. Since we don't have an LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I don't think we need to worry about DAG builder, legalization, a strict variant, etc. We should be able to expand as needed when adding more functionality/transforms. For reference, these are transform suggestions currently listed in SimplifyLibCalls.cpp: // * cbrt(expN(X)) -> expN(x/3) // * cbrt(sqrt(x)) -> pow(x,1/6) // * cbrt(cbrt(x)) -> pow(x,1/9) Also, given that we bail out on long double for now, there should not be any logical differences between platforms (unless there's some platform out there that has pow() but not cbrt()). Differential Revision: https://reviews.llvm.org/D51753 llvm-svn: 342348
*	[x86] fix uses check in broadcast transform (PR38949)	Sanjay Patel	2018-09-16	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://bugs.llvm.org/show_bug.cgi?id=38949 It's not clear to me that we even need a one-use check in this fold. Ie, 2 independent loads might be better than a load+dependent shuffle. Note that the existing re-use tests are not affected. We actually do form a broadcast node in those tests now because there's no extra use of the insert_subvector node in those cases. But something later in isel pattern matching decides that it is not worth using a broadcast for the full load in those tests: Legalized selection DAG: %bb.0 'test_broadcast_2f64_4f64_reuse:' t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t18: v4f64 = insert_subvector undef:v4f64, t7, Constant:i64<0> t20: v4f64 = insert_subvector t18, t7, Constant:i64<2> Becomes: t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ISEL: Starting selection on root node: t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ... Created node: t27: v4f64 = INSERT_SUBREG IMPLICIT_DEF:v4f64, t7, TargetConstant:i32<7> Morphed node: t21: v4f64 = VINSERTF128rr t27, t7, TargetConstant:i8<1> llvm-svn: 342347
*	[InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and ↵	Craig Topper	2018-09-15	2	-7/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(sub (zext x), (zext y)) --> (zext (sub x, y)) Summary: If the sub doesn't overflow in the original type we can move it above the sext/zext. This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported. Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52075 llvm-svn: 342335
*	Give InfoStreamBuilder an opt-in method to write a hash of the PDB as GUID.	Nico Weber	2018-09-15	2	-10/+34
\| \| \| \| \| \| \| \| \| \| \| \| \|	Naively computing the hash after the PDB data has been generated is in practice as fast as other approaches I tried. I also tried online-computing the hash as parts of the PDB were written out (https://reviews.llvm.org/D51887; that's also where all the measuring data is) and computing the hash in parallel (https://reviews.llvm.org/D51957). This approach here is simplest, without being slower. Differential Revision: https://reviews.llvm.org/D51956 llvm-svn: 342333
*	Update microsoftDemangle() to work more like itaniumDemangle().	Nico Weber	2018-09-15	2	-29/+32
\| \| \| \| \| \| \| \| \| \| \|	* Use same method of initializing the output stream and its buffer * Allow a nullptr Status pointer * Don't print the mangled name on demangling error * Write to N (if it is non-nullptr) Differential Revision: https://reviews.llvm.org/D52104 llvm-svn: 342330
*	[X86] Remove an fp->int->fp domain crossing in LowerUINT_TO_FP_i64.	Craig Topper	2018-09-15	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back? Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52134 llvm-svn: 342327
*	[X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))	Craig Topper	2018-09-15	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load. Inspired by PR38840. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52121 llvm-svn: 342326
*	[InstCombine][x86] try harder to convert blendv intrinsic to generic IR ↵	Sanjay Patel	2018-09-15	1	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR38814) Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 If this works, it's an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. I don't think that's too likely, but I've kept this patch minimal with a 'TODO', so we can test that theory in the wild before expanding the transform. Differential Revision: https://reviews.llvm.org/D52059 llvm-svn: 342324
*	[InstCombine] Inefficient pattern for high-bits checking 3 (PR38708)	Roman Lebedev	2018-09-15	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) The last (as far i know?) pattern, non-canonical due to the extra use. https://godbolt.org/z/aCMsPk https://rise4fun.com/Alive/I6f https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52062 llvm-svn: 342321
*	[CodeGenPrepare] Preserve debug locs in OptimizeExtractBits	Vedant Kumar	2018-09-15	1	-1/+6
\| \| \| \| \| \| \| \| \|	CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it easier for the backend to emit fancy extract-bits instructions (e.g UBFX). Teach it to preserve debug locations and salvage debug values. llvm-svn: 342319
*	[WebAssembly] SIMD shifts	Thomas Lively	2018-09-15	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implement shifts of vectors by i32. Since LLVM defines shifts as binary operations between two vectors, this involves pattern matching on splatted shift operands. For v2i64 shifts any i32 shift operands have to be zero extended in the input and any i64 shift operands have to be wrapped in the output. Depends on D52007. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51906 llvm-svn: 342302
*	[WebAssembly] SIMD neg	Thomas Lively	2018-09-14	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Depends on D52007. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52009 llvm-svn: 342296
*	[BreakFalseDeps] Fix bad formatting. NFC	Craig Topper	2018-09-14	1	-1/+1
\| \| \| \|	llvm-svn: 342293
*	[InstCombine] refactor mul narrowing folds; NFCI	Sanjay Patel	2018-09-14	4	-112/+60
\| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to rL342278: The test diffs are all cosmetic due to the change in value naming, but I'm including that to show that the new code does perform these folds rather than something else in instcombine. D52075 should be able to use this code too rather than duplicating all of the logic. llvm-svn: 342292
*	[InstCombine] add/use overflowing math helper functions; NFC	Sanjay Patel	2018-09-14	2	-3/+19
\| \| \| \| \| \| \| \|	The mul case can already be refactored to use this similar to rL342278. The sub case is proposed in D52075. llvm-svn: 342289
*	[PowerPC] Fix the calling convention for i1 arguments on PPC32	Lion Yang	2018-09-14	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Integer types smaller than i32 must be extended to i32 by default. The feature "crbits" introduced at r202451 handles i1 as a special case, but it did not extend properly. The caller was, therefore, passing i1 stack arguments by writing 0/1 to the first byte of the 4-byte stack object and callee was reading the first byte for the value. "crbits" is enabled if the optimization level is greater than 1, which is very common in "release builds". Such discrepancies with ABI specification also introduces potential incompatibility with programs or libraries built with other compilers e.g. GCC. Fixes PR38661 Reviewers: hfinkel, cuviper Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D51108 llvm-svn: 342288
*	[codeview] Remove dead code	Reid Kleckner	2018-09-14	2	-17/+0
\| \| \| \|	llvm-svn: 342285
*	[PDB] Refactor a little of the Symbol creation code.	Zachary Turner	2018-09-14	3	-28/+16
\| \| \| \| \| \| \| \| \| \| \|	Eventually we need to be able to support nested types, which don't have an associated CVType record. To handle this, remove the CVType from all of the record classes, and instead store the deserialized record. Then move the deserialization up to the thing that creates the type. This actually makes error handling better anyway as we can return an invalid symbol instead of asserting false. llvm-svn: 342284
*	[SampleFDO] Add FunctionOffsetTable in compact binary format profile.	Wei Mi	2018-09-14	4	-13/+158
\| \| \| \| \| \| \| \| \| \| \| \|	The patch saves a function offset table which maps function name index to the offset of its function profile to the start of the binary profile. By using the function offset table, for those function profiles which will not be used when compiling a module, the profile reader does't have to read them. For profile size around 10~20M, it saves ~10% compile time. Differential Revision: https://reviews.llvm.org/D51863 llvm-svn: 342283
*	[InstCombine] refactor add narrowing folds; NFCI	Sanjay Patel	2018-09-14	2	-71/+44
\| \| \| \| \| \| \| \| \|	The test diffs are all cosmetic due to the change in value naming, but I'm including that to show that the new code does perform these folds rather than something else in instcombine. llvm-svn: 342278
*	HotColdSplit: fix invalid SSA due to outlining	Sebastian Pop	2018-09-14	1	-15/+16
\| \| \| \| \| \| \| \|	The test used to fail with an invalid phi node: the two predecessors were outlined and the SSA representation was left invalid. The patch adds the exit block to the cold region. llvm-svn: 342277
*	HotColdSplit: fix isSingleEntrySingleExit	Sebastian Pop	2018-09-14	1	-10/+6
\| \| \| \| \| \| \| \| \| \| \| \|	remove duplicate entries from isSingleEntrySingleExit: the Entry block is already added by the loop over the dominance frontier. Remove the heuristic from isOutlineCandidate that a region is too small when it only contains a basic block. With this change we now grow regions starting from a block and we continue adding to the ValidColdRegion. Check the heuristic just before code generation. llvm-svn: 342276
*	HotColdSplit: add back propagation to extend cold regions	Sebastian Pop	2018-09-14	1	-18/+64
\| \| \| \| \| \| \| \|	Also fix a problem in forward propagation: const TerminatorInst *TI = It->getTerminator(); was set outside the while loop that iterates over It. llvm-svn: 342275
*	Remove unused DIASession field	Reid Kleckner	2018-09-14	2	-3/+2
\| \| \| \|	llvm-svn: 342272
*	AMDGPU: Clear the bits before they are being set in program resource registers	Konstantin Zhuravlyov	2018-09-14	1	-0/+1
\| \| \| \| \| \|	Change by Tony Tye llvm-svn: 342270
*	Revert r342183 "[DAGCombine] Fix crash when store merging created an ↵	Reid Kleckner	2018-09-14	1	-8/+1
\| \| \| \| \| \| \| \| \|	extract_subvector with invalid index." Causes 'isVector() && "Invalid vector type!"' assertion when building Skia in Chrome. llvm-svn: 342265
*	Fix debug info for SelectionDAG legalization of DAG nodes with two results.	Adrian Prantl	2018-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the debug info handling for SelectionDAG legalization of DAG nodes with two results. When an replaced SDNode has more than one result, transferDbgValues was always copying the SDDbgValue from the first result and attaching them to all members. In reality SelectionDAG::ReplaceAllUsesWith() is given an array of SDNodes (though the type signature doesn't make this obvious (cf. the call site code in ReplaceNode()). rdar://problem/44162227 Differential Revision: https://reviews.llvm.org/D52112 llvm-svn: 342264
*	[ThinLTOCodeGenerator] Avoid Rehash StringMap in ThreadPool	Steven Wu	2018-09-14	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: During threaded thinLTO, it is possible that the entry for current module doesn't exist in StringMaps (like ExportLists, ResolvedODR, etc.). Using operator[] might trigger a rehash for the StringMap, which might happen on multiple threads at the same time. rdar://problem/43846199 Reviewers: tejohnson, mehdi_amini, kromanova, pcc Reviewed By: tejohnson Subscribers: dang, inglorion, eraman, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D52049 llvm-svn: 342263
*	Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP"	Reid Kleckner	2018-09-14	1	-152/+27
\| \| \| \| \| \| \| \| \| \|	It causes assertion failures while building Skia for Android in Chromium: https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550 Reduction forthcoming. llvm-svn: 342260
*	[X86][SSE] Lower shuffles to permute(unpack(x,y)) (PR31151)	Simon Pilgrim	2018-09-14	1	-5/+75
\| \| \| \| \| \| \| \| \| \|	Attempt to lower a shuffle as an unpack of elements from two inputs followed by a single-input (wider) permutation. As long as the permutation is wider this is a win - there may be some circumstances where same size permutations would also be useful but I've left that for future work. Differential Revision: https://reviews.llvm.org/D52043 llvm-svn: 342257
*	fix noasserts build	Adrian Prantl	2018-09-14	1	-0/+2
\| \| \| \|	llvm-svn: 342247
*	SelectionDAG: Add compact SDDbgValue representation to -dag-dump-verbose output	Adrian Prantl	2018-09-14	2	-0/+19
\| \| \| \|	llvm-svn: 342245
*	fix typos	Adrian Prantl	2018-09-14	2	-2/+2
\| \| \| \|	llvm-svn: 342241
*	[X86][BMI1] Fix BLSI/BLSMSK/BLSR BMI1 scheduling on btver2	Simon Pilgrim	2018-09-14	1	-1/+1
\| \| \| \| \| \|	These have the same behaviour as tzcnt on btver2 - confirmed with AMD 16h SOG, Agner and instlatx64. llvm-svn: 342235
*	[X86][BMI1] Add scheduler class for BLSI/BLSMSK/BLSR BMI1 instructions	Simon Pilgrim	2018-09-14	11	-48/+35
\| \| \| \|	llvm-svn: 342234
*	[AMDGPU] Ensure trig range reduction only used for subtargets that require it	David Stuttard	2018-09-14	4	-9/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222
*	[DWARF] reposting r342048, which was reverted in r342056 due to buildbot	Wolfgang Pieb	2018-09-14	7	-221/+201
\| \| \| \| \| \| \| \|	errors. Adjusted 2 test cases for ARM and darwin and fixed a bug with the original change in dsymutil. llvm-svn: 342218
*	[ARM] bottom-top mul support in ARMParallelDSP	Sam Parker	2018-09-14	1	-27/+152
\| \| \| \| \| \| \| \| \| \|	On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342210