bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix small grammar-o.	Eric Christopher	2018-05-16	1	-1/+1
\| \| \| \|	llvm-svn: 332522
*	Fix up a misleading format warning.	Eric Christopher	2018-05-16	1	-1/+1
\| \| \| \|	llvm-svn: 332521
*	[WebAssembly] MC: Ensure that FUNCTION_OFFSET relocations are always against ↵	Sam Clegg	2018-05-16	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	function symbols. The getAtom() method wasn't doing what we needed in all cases. We want the symbols for the function which defines that section. We can compute this easily enough and we know that we have at most one function in each section. Once this lands I will revert rL331412 which is no longer needed. Fixes PR37409 Differential Revision: https://reviews.llvm.org/D46970 llvm-svn: 332517
*	[MachineOutliner] Don't save/restore LR for tail calls.	Eli Friedman	2018-05-16	1	-3/+4
\| \| \| \| \| \| \| \| \|	The cost computation assumes we do this correctly, but the actual lowering was wrong. Differential Revision: https://reviews.llvm.org/D46923 llvm-svn: 332514
*	[X86] Fix typo in instregex for CVTSI642SDrr	Simon Pilgrim	2018-05-16	1	-1/+1
\| \| \| \|	llvm-svn: 332510
*	Fix llvm::sys::path::remove_dots() to return "." instead of an empty path.	Greg Clayton	2018-05-16	1	-0/+4
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D46887 llvm-svn: 332508
*	[Timers] TimerGroup: add constructor from StringMap<TimeRecord>	Roman Lebedev	2018-05-16	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is needed for the continuation of D46504, to be able to store the timings. Reviewers: george.karpenkov, NoQ, alexfh, sbenza Reviewed By: alexfh Subscribers: llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D46939 llvm-svn: 332506
*	[Timers] TimerGroup: make printJSONValues() method public	Roman Lebedev	2018-05-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is needed for the continuation of D46504, to be able to store the timings. Reviewers: george.karpenkov, NoQ, alexfh, sbenza Reviewed By: alexfh Subscribers: llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D46938 llvm-svn: 332505
*	[Timers] TimerGroup::printJSONValue(): print doubles with no precision loss	Roman Lebedev	2018-05-16	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Although this is not stricly required, i would very much prefer not to have known random precision losses along the way. Reviewers: george.karpenkov, NoQ, alexfh, sbenza Reviewed By: george.karpenkov Subscribers: llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D46937 llvm-svn: 332504
*	[Timers] TimerGroup::printJSONValues(): print mem timer with .mem suffix	Roman Lebedev	2018-05-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We have just used `.sys` suffix for the previous timer, this is clearly a typo Reviewers: george.karpenkov, NoQ, alexfh, sbenza Reviewed By: alexfh Subscribers: llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D46936 llvm-svn: 332503
*	[X86][AVX512DQ] Use packed instructions for scalar FP<->i64 conversions on ↵	Craig Topper	2018-05-16	1	-8/+62
\| \| \| \| \| \| \| \| \| \| \| \|	32-bit targets As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead. Fixes PR3163. Differential Revision: https://reviews.llvm.org/D43441 llvm-svn: 332498
*	Signal handling should be signal-safe	JF Bastien	2018-05-16	3	-84/+209
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Before this patch, signal handling wasn't signal safe. This leads to real-world crashes. It used ManagedStatic inside of signals, this can allocate and can lead to unexpected state when a signal occurs during llvm_shutdown (because llvm_shutdown destroys the ManagedStatic). It also used cl::opt without custom backing storage. Some de-allocation was performed as well. Acquiring a lock in a signal handler is also a great way to deadlock. We can't just disable signals on llvm_shutdown because the signals might do useful work during that shutdown. We also can't just disable llvm_shutdown for programs (instead of library uses of clang) because we'd have to then mark the pointers as not leaked and make sure all the ManagedStatic uses are OK to leak and remain so. Move all of the code to lock-free datastructures instead, and avoid having any of them in an inconsistent state. I'm not trying to be fancy, I'm not using any explicit memory order because this code isn't hot. The only purpose of the atomics is to guarantee that a signal firing on the same or a different thread doesn't see an inconsistent state and crash. In some cases we might miss some state (for example, we might fail to delete a temporary file), but that's fine. Note that I haven't touched any of the backtrace support despite it not technically being totally signal-safe. When that code is called we know something bad is up and we don't expect to continue execution, so calling something that e.g. sets errno is the least of our problems. A similar patch should be applied to lib/Support/Windows/Signals.inc, but that can be done separately. Fix r332428 which I reverted in r332429. I originally used double-wide CAS because I was lazy, but some platforms use a runtime function for that which thankfully failed to link (it would have been bad for signal handlers otherwise). I use a separate flag to guard the data instead. <rdar://problem/28010281> Reviewers: dexonsmith Subscribers: steven_wu, llvm-commits llvm-svn: 332496
*	[DAG] Prune cycle check in store merge.	Nirav Dave	2018-05-16	1	-18/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of merging stores we check that fusing the nodes does not cause a cycle due to one candidate store being indirectly dependent on another store (this may happen via chained memory copies). This is done by searching if a store is a predecessor to another store's value. Prune the search at the candidate search's root node which is a predecessor to all candidate stores. This reduces the size of the subgraph searched in large basic blocks. Reviewers: jyknight Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D46955 llvm-svn: 332490
*	[DAG] Defer merge store cycle checking to just before merge. NFCI.	Nirav Dave	2018-05-16	1	-8/+20
\| \| \| \|	llvm-svn: 332489
*	[AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume ↵	Tony Tye	2018-05-16	2	-34/+34
\| \| \| \| \| \| \| \| \| \|	execution. No longer require the queue pointer to be passed in in fixed SGPRs. Differential Revision: https://reviews.llvm.org/D46769 llvm-svn: 332485
*	[AArch64][SVE] Improve diagnostics for vectors with incorrect element-size.	Sander de Smalen	2018-05-16	2	-11/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For regular SVE vector operands, this patch introduces a more sensible diagnostic when the vector has a wrong suffix (e.g. z0.s vs z0.b). For example: add z0.s, z1.s, z2.b -> invalid element width ^_____^ mismatch For the vector-with-shift/extend (e.g. z0.s, uxtw #2) this patch takes a slightly different approach and instead returns a 'invalid operand' if the element size is not as expected. This is because the diagnostics are more specificied to suggest using the right shift/extend suffix. This is a trade-off not to introduce more operand classes and still provide useful diagnostics for LD1 and PRF instructions. For example: ld1w z1.s, p0/z, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw\|sxtw)' ld1w z1.d, p0/z, [x0, z0.s] -> invalid operand ^________________^ mismatch For gather prefetches, both 'z0.s' and 'z0.d' would be allowed: prfw #0, p0, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw\|sxtw) #2' prfw #0, p0, [x0, z0.d] -> invalid shift/extend specified, expected 'z[0..31].d, (lsl\|uxtw\|sxtw) #2' Without this change, the diagnostic would unnecessarily suggest a different element size: prfw #0, p0, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].d, (lsl\|uxtw\|sxtw) #2' Reviewers: SjoerdMeijer, aemerson, fhahn, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46688 llvm-svn: 332483
*	[AArch64] Gangup loads and stores for pairing.	Sirish Pande	2018-05-16	3	-4/+88
\| \| \| \| \| \| \| \| \| \|	Keep loads and stores together (target defines how many loads and stores to gang up), such that it will help in pairing and vectorization. Differential Revision https://reviews.llvm.org/D46477 llvm-svn: 332482
*	[InstCombine] allow more binop (shuffle X), C transforms	Sanjay Patel	2018-05-16	1	-2/+4
\| \| \| \| \| \| \| \| \|	The canonicalization was restricted to shuffle masks with a 1-to-1 mapping to the constant vector, but that disqualifies the common splat pattern. This is part of solving PR37463: https://bugs.llvm.org/show_bug.cgi?id=37463 llvm-svn: 332479
*	[AArch64][SVE] Asm: Support for gather PRF prefetch instructions	Sander de Smalen	2018-05-16	2	-0/+159
\| \| \| \| \| \| \| \| \| \|	Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46686 llvm-svn: 332472
*	[BasicAA] Fix handling of invariant group launders	Krzysztof Pszeniczny	2018-05-16	2	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A recent patch ([[ https://reviews.llvm.org/rL331587 \| rL331587 ]]) to Capture Tracking taught it that the `launder_invariant_group` intrinsic captures its argument only by returning it. Unfortunately, BasicAA still considered every call instruction as a possible escape source and hence concluded that the result of a `launder_invariant_group` call cannot alias any local non-escaping value. This led to [[ https://bugs.llvm.org/show_bug.cgi?id=37458 \| bug 37458 ]]. This patch updates the relevant check for escape sources in BasicAA. Reviewers: Prazek, kuhar, rsmith, hfinkel, sanjoy, xbolva00 Reviewed By: hfinkel, xbolva00 Subscribers: JDevlieghere, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D46900 llvm-svn: 332466
*	[mips] Simplify some of the predicate scopes for (negative) multiply add/sub ↵	Simon Dardis	2018-05-16	1	-23/+20
\| \| \| \| \| \|	instructions (NFCI) llvm-svn: 332464
*	[mips] Join existing scopes for DecoderNamespace (NFCI)	Simon Dardis	2018-05-16	1	-6/+3
\| \| \| \|	llvm-svn: 332462
*	AMDGPU: Custom lower v4i16/v4f16 vector operations	Matt Arsenault	2018-05-16	4	-19/+124
\| \| \| \| \| \| \| \| \|	Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453
*	[SimplifyLibcalls] Replace locked IO with unlocked IO	David Bolvansky	2018-05-16	3	-20/+258
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed, Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer, lebedev.ri, rja Reviewed By: rja Subscribers: rja, srhines, efriedma, lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D45736 llvm-svn: 332452
*	[X86] Split WriteCvtI2F/WriteCvtF2I into I<->F32 and I<->F64 scheduler classes	Simon Pilgrim	2018-05-16	14	-359/+338
\| \| \| \| \| \|	A lot of the models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332451
*	[LoopUnroll] Split out simplify code after Unroll into a new function. NFC	David Green	2018-05-16	1	-34/+46
\| \| \| \| \| \| \| \| \|	So that it can be shared with other passes that may end up doing the same thing. Differential Revision: https://reviews.llvm.org/D45874 llvm-svn: 332450
*	[GlobalISel][IRTranslator] Split aggregates during IR translation.	Amara Emerson	2018-05-16	3	-136/+289
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently handle all aggregates by creating one large LLT, and letting the legalizer deal with splitting them up. However using this approach means that we can't support big endian code correctly. This patch changes the way that the IRTranslator deals with aggregate values, by splitting them up into their constituent element values. To do this, parts of the translator need to be modified to deal with multiple VRegs for a single Value. A new Value to VReg mapper is introduced to help keep compile time under control, currently there is no measurable impact on CTMark despite the extra code being generated in some cases. Patch is based on the original work of Tim Northover. Differential Revision: https://reviews.llvm.org/D46018 llvm-svn: 332449
*	[mips] Add support for isBranchOffsetInRange and use it for MipsLongBranch	Simon Dardis	2018-05-16	4	-14/+205
\| \| \| \| \| \| \| \| \| \| \| \|	Add support for this target hook, covering MIPS, microMIPS and MIPSR6, along with some tests. Also add missing getOppositeBranchOpc() cases exposed by the tests. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D46794 llvm-svn: 332446
*	[AArch64] Support "S" inline assembler constraint	Peter Smith	2018-05-16	2	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch re-introduces the "S" inline assembler constraint. This matches an absolute symbolic address or a label reference. The primary use case is asm("adrp %0, %1\n\t" "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var)); I say re-introduces as it seems like "S" was implemented in the original AArch64 backend, but it looks like it wasn't carried forward to the merged backend. The original implementation had A and L modifiers that could be used to print ":lo12:" to the string. It looks like gcc doesn't use these and :lo12: is expected to be written in the inline assembly string so I've not implemented A and L. Clang already supports the S modifier. Fixes PR37180 Differential Revision: https://reviews.llvm.org/D46745 llvm-svn: 332444
*	[AArch64][SVE] Asm: Support for structured LD2, LD3 and LD4 (scalar+scalar) ↵	Sander de Smalen	2018-05-16	2	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \|	load instructions. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D46679 llvm-svn: 332442
*	Emit a left-shift instead of a power-of-two multiply for jump-tables	Alexander Richardson	2018-05-16	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SelectionDAGLegalize::ExpandNode() inserts an ISD::MUL when lowering a BR_JT opcode. While many backends optimize this multiply into a shift, e.g. the MIPS backend currently always lowers this into a sequence of load-immediate+multiply+mflo in MipsSETargetLowering::lowerMulDiv(). I initially changed the multiply to a shift in the MIPS backend but it turns out that would not have handled the MIPSR6 case and was a lot more code than doing it in LegalizeDAG. I believe performing this simple optimization in LegalizeDAG instead of each individual backend is the better solution since this also fixes other backeds such as MSP430 which calls the multiply runtime function __mspabi_mpyi without this patch. Reviewers: sdardis, atanasyan, pftbest, asl Reviewed By: sdardis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45760 llvm-svn: 332439
*	[AArch64][SVE] Asm: Support for contiguous PRF prefetch instructions.	Sander de Smalen	2018-05-16	4	-2/+77
\| \| \| \| \| \| \| \| \| \|	Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46682 llvm-svn: 332433
*	[Unix] Indent ChangeStd{in,out}ToBinary.	Fangrui Song	2018-05-16	1	-4/+4
\| \| \| \|	llvm-svn: 332432
*	Remove unused variable introduced in r332336	Mikael Holmen	2018-05-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The unused variable caused a compilation warning: ../lib/Target/X86/X86ISelLowering.cpp:34614:17: error: unused variable 'SMax' [-Werror,-Wunused-variable] if (SDValue SMax = MatchMinMax(SMin, ISD::SMAX, C1)) ^ 1 error generated. llvm-svn: 332431
*	[ObjCARC] Prevent code motion into a catchswitch	Shoaib Meenai	2018-05-16	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \|	A catchswitch must be the only non-phi instruction in its basic block; attempting to move a retain or release into a catchswitch basic block will result in invalid IR. Explicitly mark a CFG hazard in this case to prevent the code motion. Differential Revision: https://reviews.llvm.org/D46482 llvm-svn: 332430
*	Revert "Signal handling should be signal-safe"	JF Bastien	2018-05-16	3	-199/+87
\| \| \| \| \| \|	Some bots don't have double-pointer width compare-and-exchange. Revert for now.q llvm-svn: 332429
*	Signal handling should be signal-safe	JF Bastien	2018-05-16	3	-87/+199
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Before this patch, signal handling wasn't signal safe. This leads to real-world crashes. It used ManagedStatic inside of signals, this can allocate and can lead to unexpected state when a signal occurs during llvm_shutdown (because llvm_shutdown destroys the ManagedStatic). It also used cl::opt without custom backing storage. Some de-allocation was performed as well. Acquiring a lock in a signal handler is also a great way to deadlock. We can't just disable signals on llvm_shutdown because the signals might do useful work during that shutdown. We also can't just disable llvm_shutdown for programs (instead of library uses of clang) because we'd have to then mark the pointers as not leaked and make sure all the ManagedStatic uses are OK to leak and remain so. Move all of the code to lock-free datastructures instead, and avoid having any of them in an inconsistent state. I'm not trying to be fancy, I'm not using any explicit memory order because this code isn't hot. The only purpose of the atomics is to guarantee that a signal firing on the same or a different thread doesn't see an inconsistent state and crash. In some cases we might miss some state (for example, we might fail to delete a temporary file), but that's fine. Note that I haven't touched any of the backtrace support despite it not technically being totally signal-safe. When that code is called we know something bad is up and we don't expect to continue execution, so calling something that e.g. sets errno is the least of our problems. A similar patch should be applied to lib/Support/Windows/Signals.inc, but that can be done separately. <rdar://problem/28010281> Reviewers: dexonsmith Subscribers: aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D46858 llvm-svn: 332428
*	[DebugInfo] Only handle DBG_VALUE in InlineSpiller.	Shiva Chen	2018-05-16	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	The instructions using registers should be DBG_VALUE and normal instructions. Use isDebugValue() to filter out DBG_VALUE and add an assert to ensure there is no other kind of debug instructions using the registers. Differential Revision: https://reviews.llvm.org/D46739 Patch by Hsiangkai Wang. llvm-svn: 332427
*	Fix LSR compile time hang.	Evgeny Stupachenko	2018-05-16	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Limit number of reassociations in GenerateReassociationsImpl. Reviewers: qcolombet, mkazantsev Differential Revision: https://reviews.llvm.org/D46039 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 332426
*	ARM: Remove unnecessary argument. NFCI.	Peter Collingbourne	2018-05-16	2	-6/+3
\| \| \| \| \| \|	IsLittleEndian is already a field of ARMAsmBackend. llvm-svn: 332420
*	ARM: Deduplicate code and remove unnecessary declaration. NFCI.	Peter Collingbourne	2018-05-16	3	-47/+11
\| \| \| \|	llvm-svn: 332419
*	[MachineOutliner] Add optsize markings to outlined functions.	Eli Friedman	2018-05-15	1	-0/+8
\| \| \| \| \| \| \| \| \|	It doesn't matter much this late in the pipeline, but one place that does check for it is the function alignment code. Differential Revision: https://reviews.llvm.org/D46373 llvm-svn: 332415
*	[AMDGPU] Fix handling of void types in isLegalAddressingMode	Stanislav Mekhanoshin	2018-05-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	It is legal for the type passed to isLegalAddressingMode to be unsized or, more specifically, VoidTy. In this case, we must check the legality of load / stores for all legal types. Directly trying to call getTypeStoreSize is incorrect, and leads to breakage in e.g. Loop Strength Reduction. This change guards against that behaviour. Differential Revision: https://reviews.llvm.org/D40405 llvm-svn: 332409
*	[InstCombine] fix binop (shuffle X), C --> shuffle (binop X, C') to check uses	Sanjay Patel	2018-05-15	1	-1/+1
\| \| \| \|	llvm-svn: 332407
*	[WebAssembly] Provide WasmFunction content offset information.	Sam Clegg	2018-05-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	WasmObjectWriter mostly operates with function segments offsets that do not include their size fields. WasmObjectFile needs to have and provide this information to the lld to maintain proper R_WEBASSEMBLY_FUNCTION_OFFSET_I32 relocations entries. Patch by Yury Delendik Differential Revision: https://reviews.llvm.org/D46763 llvm-svn: 332406
*	StructurizeCFG: fix inverting conditions	Marek Olsak	2018-05-15	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Samuel Pitoiset Without this patch, it appears to me that we are selecting the wrong operand when inverting conditions. In the attached test, it will select %tmp3 instead of %tmp4. To fix it, just use 'A' as everywhere. This fixes a regression introduced by "[PatternMatch] define m_Not using m_Xor and cst_pred_ty" https://reviews.llvm.org/D46351 llvm-svn: 332403
*	[msan] Instrument masked.store, masked.load intrinsics.	Evgeniy Stepanov	2018-05-15	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instrument masked store/load intrinsics. Reviewers: kcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D46785 llvm-svn: 332402
*	Move helper classes into anonymous namespaces. NFCI.	Benjamin Kramer	2018-05-15	2	-3/+5
\| \| \| \|	llvm-svn: 332400
*	[InstCombine] clean up code for binop-shuffle transforms; NFCI	Sanjay Patel	2018-05-15	1	-39/+34
\| \| \| \|	llvm-svn: 332399
*	[AArch64] Improve single vector lane unscaled stores	Evandro Menezes	2018-05-15	1	-0/+16
\| \| \| \| \| \| \| \| \| \|	When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead. In this case, also using the unscaled offset. Differential revision: https://reviews.llvm.org/D46762 llvm-svn: 332394