bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	ARMExpandPseudoInsts: Fix CMP_SWAP expansion adding a kill flag to a def	Matthias Braun	2018-11-02	1	-4/+5
\| \| \| \|	llvm-svn: 346026
*	[SystemZ::TTI] Improve cost handling of uint/sint to fp conversions.	Jonas Paulsson	2018-11-02	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load. Since the load already does the extension, there is no extra cost (previously returned 2). Review: Ulrich Weigand https://reviews.llvm.org/D54028 llvm-svn: 346009
*	Fixed inclusion of M_PI fow MinGW-w64	Sylvestre Ledru	2018-11-02	1	-1/+1
\| \| \| \| \| \|	Patch by KOLANICH llvm-svn: 346000
*	[SystemZ] Rework getInterleavedMemoryOpCost()	Jonas Paulsson	2018-11-02	1	-16/+48
\| \| \| \| \| \| \| \| \| \| \| \| \|	Model this function more closely after the BasicTTIImpl version, with separate handling of loads and stores. For loads, the set of actually loaded vectors is checked. This makes it more readable and just slightly more accurate generally. Review: Ulrich Weigand https://reviews.llvm.org/D53071 llvm-svn: 345998
*	[Hexagon] Do not reduce load size for globals in small-data	Krzysztof Parzyszek	2018-11-02	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Small-data (i.e. GP-relative) loads and stores allow 16-bit scaled offset. For a load of a value of type T, the small-data area is equivalent to an array "T sdata[65536]". This implies that objects of smaller sizes need to be closer to the beginning of sdata, while larger objects may be farther away, or otherwise the offset may be insufficient to reach it. Similarly, an object of a larger size should not be accessed via a load of a smaller size. llvm-svn: 345975
*	[DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug ↵	Alexey Bataev	2018-11-02	3	-4/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	directives are requested. Summary: If the output of debug directives only is requested, we should drop emission of ',debug' option from the target directive. Required for supporting of nvprof profiler. Reviewers: probinson, echristo, dblaikie Subscribers: Hahnfeld, jholewinski, llvm-commits, JDevlieghere, aprantl Differential Revision: https://reviews.llvm.org/D46061 llvm-svn: 345972
*	[AMDGPU] UBSan bug fix for r345710	Neil Henning	2018-11-02	1	-1/+1
\| \| \| \| \| \| \| \|	UBSan detected an error in our ISelLowering that is exposed only when you have a dmask == 0x1. Fix this by adding in an explicit check to ensure we don't do the UBSan detected shl << 32. llvm-svn: 345962
*	AMDGPU: Fix assertion with bitcast from i64 constant to v4i16	Matt Arsenault	2018-11-02	1	-3/+4
\| \| \| \|	llvm-svn: 345922
*	[WebAssembly] Added a .globaltype directive to .s output.	Wouter van Oortmerssen	2018-11-02	3	-8/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Assembly output can use globals like __stack_pointer implicitly, but has no way of indicating the type of such a global, which makes it hard for tools processing it (such as the MC Assembler) to reconstruct this information. The improved assembler directives parsing (in progress in https://reviews.llvm.org/D53842) will make use of this information. Also deleted code for the .import_global directive which was unused. New test case in userstack.ll Reviewers: dschuff, sbc100 Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54012 llvm-svn: 345917
*	[WebAssembly] General vector shift lowering	Thomas Lively	2018-11-02	1	-12/+27
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds support for lowering non-splat shifts. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53625 llvm-svn: 345916
*	[WebAssembly] Expand inserts and extracts with variable indices	Thomas Lively	2018-11-02	2	-0/+30
\| \| \| \| \| \| \| \| \| \|	Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53964 llvm-svn: 345913
*	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64	Mandeep Singh Grang	2018-11-01	3	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909
*	[AMDGPU] Handle the idot8 pattern generated by FE.	Farhana Aleen	2018-11-01	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge increase in the compile time. Support the pattern that clang FE generates after reordering the additions in integer-dot8 source language pattern. Author: FarhanaAleen Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D53937 llvm-svn: 345902
*	[COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic	Mandeep Singh Grang	2018-11-01	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: rnk, mstorsjo, efriedma, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53962 llvm-svn: 345892
*	[WebAssembly] Fix signature parsing for 'try' in AsmParser	Heejin Ahn	2018-11-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Like `block` or `loop`, `try` can take an optional signature which can be omitted. This patch allows `try`'s signature to be omitted. Also added some tests for EH instructions. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53873 llvm-svn: 345888
*	[Hexagon] Remove unintended fallthrough from MC duplex code	Reid Kleckner	2018-11-01	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	I added these annotations in r345878 because I wasn't sure if the fallthrough was intended. Krzysztof Parzyszek confirmed that they should be breaks, so that's what this patch does. Reviewers: kparzysz Differential Revision: https://reviews.llvm.org/D53991 llvm-svn: 345883
*	Fix clang -Wimplicit-fallthrough warnings across llvm, NFC	Reid Kleckner	2018-11-01	17	-13/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 llvm-svn: 345882
*	[WebAssembly] Fixup `main` signature by default	Sam Clegg	2018-11-01	1	-6/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D53396 llvm-svn: 345880
*	Annotate possibly unintended fallthroughs in Hexagon MC code, NFC	Reid Kleckner	2018-11-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clang's -Wimplicit-fallthrough check fires on these switch cases. GCC does not warn when a case body that ends in a switch falls through to a case label of an outer switch. It's not clear if these fall throughs are truly intended. The Hexagon tests pass regardless of whether these case blocks fall through or break. For now, I have applied the intended fallthrough annotation macro with a FIXME comment to unblock enabling the warning. I will send a follow-up patch that converts them to breaks to the Hexagon maintainers. llvm-svn: 345878
*	[GlobalISel] Fix a bug in LegalizeRuleSet::clampMaxNumElements	Volkan Keles	2018-11-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This function was causing a crash when `MaxElements == 1` because it was trying to create a single element vector type. Reviewers: dsanders, aemerson, aditya_nandakumar Reviewed By: dsanders Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53734 llvm-svn: 345875
*	[LegalizeDAG] Add generic vector CTPOP expansion (PR32655)	Simon Pilgrim	2018-11-01	1	-53/+2
\| \| \| \| \| \| \| \|	This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion. Differential Revision: https://reviews.llvm.org/D53258 llvm-svn: 345869
*	[Hexagon] Fix MO_JumpTable const extender conversion	Reid Kleckner	2018-11-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Previously this case fell through to unreachable, so it is clearly not covered by any test case in LLVM. It may be dynamically unreachable, in fact. However, if it were to run, this is what it would logically do. The assert suggests that the intended behavior was not to allow folding offsets from jump table indices, which makes sense. llvm-svn: 345868
*	[AArch64] Fix unintended fallthrough and strengthen cast	Reid Kleckner	2018-11-01	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	This was added in r330630. GCC's -Wimplicit-fallthrough seems to not fire when the previous case contains a switch itself. This fallthrough was bening because the helper function implementing the case used dyn_cast to re-check the type of the node in question. After fixing the fallthrough, we can strengthen the cast. llvm-svn: 345864
*	Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64"	Mandeep Singh Grang	2018-11-01	3	-29/+0
\| \| \| \| \| \|	This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863
*	[ARM] Attempt to fix ppc64be buildbot	Sam Parker	2018-11-01	1	-2/+3
\| \| \| \|	llvm-svn: 345850
*	[ARM][CGP] Negative constant operand handling	Sam Parker	2018-11-01	1	-69/+186
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While mutating instructions, we sign extended negative constant operands for binary operators that can safely overflow. This was to allow instructions, such as add nuw i8 %a, -2, to still be able to perform a subtraction. However, the code to handle constants doesn't take into consideration that instructions, such as sub nuw i8 -2, %a, require the i8 -2 to be converted into i32 254. This is a relatively simple fix, but I've taken the time to reorganise the code a bit - mainly that instructions that can be promoted are cached and splitting up the Mutate function. Differential Revision: https://reviews.llvm.org/D53972 llvm-svn: 345840
*	[X86][X86FixupLEA] Rename processInstructionForSLM to ↵	Simon Pilgrim	2018-11-01	1	-7/+6
\| \| \| \| \| \| \| \| \| \|	processInstructionForSlowLEA (NFCI) The function isn't SLM specific (its driven by the FeatureSlowLEA flag). Minor tidyup prior to PR38225. llvm-svn: 345836
*	[mips][micromips] Fix JmpLink to TargetExternalSymbol	Aleksandar Beserminji	2018-11-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	When matching MipsISD::JmpLink t9, TargetExternalSymbol:i32'...', wrong JALR16_MM is selected. This patch adds missing pattern for JmpLink, so that JAL instruction is selected. Differential Revision: https://reviews.llvm.org/D53366 llvm-svn: 345830
*	[AArch64] Add support for ARMv8.4 in Saphira.	Chad Rosier	2018-11-01	1	-1/+1
\| \| \| \|	llvm-svn: 345827
*	[X86][SSE] Move 2-input limit up from getFauxShuffleMask to ↵	Simon Pilgrim	2018-11-01	1	-6/+3
\| \| \| \| \| \| \| \| \| \|	resolveTargetShuffleInputs (reapplied) Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed. This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs. llvm-svn: 345824
*	[Mips] Conditionally remove successor block	Stefan Maksimovic	2018-11-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In MipsBranchExpansion::splitMBB, upon splitting a block with two direct branches, remove the successor of the newly created block (which inherits successors from the original block) which is pointed to by the last branch in the original block only if the targets of two branches differ. This is to fix the failing test when ran with -verify-machineinstrs enabled. Differential Revision: https://reviews.llvm.org/D53756 llvm-svn: 345821
*	[SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion	Jonas Paulsson	2018-11-01	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	Scalar i1 to fp conversions are done with a branch sequence, so it should have a higher cost. Review: Ulrich Weigand https://reviews.llvm.org/D53924 llvm-svn: 345818
*	[SystemZ::TTI] Accurate costs for i1->double vector conversions	Jonas Paulsson	2018-11-01	2	-15/+30
\| \| \| \| \| \| \| \| \| \| \|	This factors out a new method getBoolVecToIntConversionCost() containing the code for vector sext/zext of i1, in order to reuse it for i1 to double vector conversions. Review: Ulrich Weigand https://reviews.llvm.org/D53923 llvm-svn: 345817
*	[PowerPC] Support constraint 'wi' in asm	Li Jia He	2018-11-01	1	-2/+6
\| \| \| \| \| \| \| \| \| \|	From the gcc manual, we can see that the specific limit of wi inline asm is “FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS”. The link is https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Machine-Constraints.html#Machine-Constraints. We should accept this constraint. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D53265 llvm-svn: 345810
*	X86: Consistently declare pass initializers in X86.h; NFC	Matthias Braun	2018-11-01	10	-55/+13
\| \| \| \| \| \| \|	This avoids declaring them twice: in X86TargetMachine.cpp and the file implementing the pass. llvm-svn: 345801
*	[WebAssembly] Lower vselect	Thomas Lively	2018-11-01	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53630 llvm-svn: 345797
*	[WebAssembly] Process p2align operands for SIMD loads and stores	Thomas Lively	2018-10-31	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53886 llvm-svn: 345795
*	[WebAssembly] Handle vector IMPLICIT_DEFs.	Thomas Lively	2018-10-31	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also reduce the test case for implicit defs and test it with all register classes. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53855 llvm-svn: 345794
*	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64	Mandeep Singh Grang	2018-10-31	3	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791
*	[AArch64] Sort switch cases (NFC)	Evandro Menezes	2018-10-31	1	-20/+23
\| \| \| \|	llvm-svn: 345786
*	Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction"	Craig Topper	2018-10-31	5	-41/+43
\| \| \| \| \| \|	Google is reporting regressions on some benchmarks. llvm-svn: 345785
*	[ARM] Add missing pseudo-instruction for Thumb1 RSBS.	Eli Friedman	2018-10-31	2	-0/+7
\| \| \| \| \| \| \| \| \|	Shows up rarely for 64-bit arithmetic, more frequently for the compare patterns added in r325323. Differential Revision: https://reviews.llvm.org/D53848 llvm-svn: 345782
*	Check shouldReduceLoadWidth from SimplifySetCC	Stanislav Mekhanoshin	2018-10-31	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778
*	[AMDGPU] Remove FeatureVGPRSpilling	Scott Linder	2018-10-31	6	-48/+8
\| \| \| \| \| \| \| \| \| \| \|	This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763
*	[Hexagon] Make sure not to use GP-relative addressing with PIC	Krzysztof Parzyszek	2018-10-31	3	-4/+10
\| \| \| \| \| \| \|	Make sure that -relocation-model=pic prevents use of GP-relative addressing modes. llvm-svn: 345731
*	AMDGPU: Rewrite SILowerI1Copies to always stay on SALU	Nicolai Haehnle	2018-10-31	6	-189/+749
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719
*	AMDGPU: Remove PHI loop condition optimization	Nicolai Haehnle	2018-10-31	5	-138/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718
*	[tblgen][PredicateExpander] Add the ability to describe more complex ↵	Andrea Di Biagio	2018-10-31	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	constraints on instruction operands. Before this patch, class PredicateExpander only knew how to expand simple predicates that performed checks on instruction operands. In particular, the new scheduling predicate syntax was not rich enough to express checks like this one: Foo(MI->getOperand(0).getImm()) == ExpectedVal; Here, the immediate operand value at index zero is passed in input to function Foo, and ExpectedVal is compared against the value returned by function Foo. While this predicate pattern doesn't show up in any X86 model, it shows up in other upstream targets. So, being able to support those predicates is fundamental if we want to be able to modernize all the scheduling models upstream. With this patch, we allow users to specify if a register/immediate operand value needs to be passed in input to a function as part of the predicate check. Now, register/immediate operand checks all derive from base class CheckOperandBase. This patch also changes where TIIPredicate definitions are expanded by the instructon info emitter. Before, definitions were expanded in class XXXGenInstrInfo (where XXX is a target name). With the introduction of this new syntax, we may want to have TIIPredicates expanded directly in XXXInstrInfo. That is because functions used by the new operand predicates may only exist in the derived class (i.e. XXXInstrInfo). This patch is a non functional change for the existing scheduling models. In future, we will be able to use this richer syntax to better describe complex scheduling predicates, and expose them to llvm-mca. Differential Revision: https://reviews.llvm.org/D53880 llvm-svn: 345714
*	[AMDGPU] support image load/store a16	Neil Henning	2018-10-31	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 llvm-svn: 345710
*	[LV] Support vectorization of interleave-groups that require an epilog under	Dorit Nuzman	2018-10-31	12	-33/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705