bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[ARM] Add support for ORR and ORN instruction substitutions	John Brawn	2017-05-05	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	Recently support was added for substituting one intruction for another by negating or inverting the immediate, but ORR and ORN were missed so this patch adds them. This one is slightly different to the others in that ORN only exists in thumb, so we only do the substitution in thumb. Differential Revision: https://reviews.llvm.org/D32534 llvm-svn: 302224
*	[ARM] ACLE Chapter 9 intrinsics	Sam Parker	2017-05-04	4	-139/+338
\| \| \| \| \| \| \| \| \| \| \| \|	Added the integer data processing intrinsics from ACLE v2.1 Chapter 9 but I have missed out the saturation_occurred intrinsics for now. For the instructions that read and write the GE bits, a chain is included and the only instruction that reads these flags (sel) is only selectable via the implemented intrinsic. Differential Revision: https://reviews.llvm.org/D32281 llvm-svn: 302126
*	[IR] Abstract away ArgNo+1 attribute indexing as much as possible	Reid Kleckner	2017-05-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Do three things to help with that: - Add AttributeList::FirstArgIndex, which is an enumerator currently set to 1. It allows us to change the indexing scheme with fewer changes. - Add addParamAttr/removeParamAttr. This just shortens addAttribute call sites that would otherwise need to spell out FirstArgIndex. - Remove some attribute-specific getters and setters from Function that take attribute list indices. Most of these were only used from BuildLibCalls, and doesNotAlias was only used to test or set if the return value is malloc-like. I'm happy to split the patch, but I think they are probably easier to review when taken together. This patch should be NFC, but it sets the stage to change the indexing scheme to this, which is more convenient when indexing into an array: 0: func attrs 1: retattrs 2...: arg attrs Reviewers: chandlerc, pete, javed.absar Subscribers: david2050, llvm-commits Differential Revision: https://reviews.llvm.org/D32811 llvm-svn: 302060
*	ARM: avoid handing a deleted node back to TableGen during ISel.	Tim Northover	2017-05-02	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	When we replaced the multiplicand the destination node might already exist. When that happens the original gets CSEd and deleted. However, it's actually used as the offset so nonsense is produced. Should fix PR32726. llvm-svn: 301983
*	ARM: add arm1176j-f processor	Tim Northover	2017-05-02	1	-0/+1
\| \| \| \| \| \| \| \| \|	I doubt anyone actually uses it, and I'm not even entirely convinced it exists myself; but it is our default for "clang -arch armv6". Functionally, if it does exist it's identical to the arm1176jz-f from LLVM's point of view (the difference is apparently in the "Security Extensions"). llvm-svn: 301962
*	[ARM] GlobalISel: Use TableGen instruction selector	Diana Picus	2017-05-02	5	-22/+57
\| \| \| \| \| \| \| \| \| \| \|	Emit and use the TableGen instruction selector for ARM. At the moment, this allows us to remove the hand-written code for selecting G_SDIV and G_UDIV. Future commits will focus on increasing the code coverage for it and removing more dead code from the current instruction selector. llvm-svn: 301905
*	Use Argument::hasAttribute and AttributeList::ReturnIndex more	Reid Kleckner	2017-04-28	1	-13/+11
\| \| \| \| \| \| \| \| \| \| \|	This eliminates many extra 'Idx' induction variables in loops over arguments in CodeGen/ and Target/. It also reduces the number of places where we assume that ReturnIndex is 0 and that we should add one to argument numbers to get the corresponding attribute list index. NFC llvm-svn: 301666
*	[ARM] GlobalISel: fixup r301632	Diana Picus	2017-04-28	1	-42/+0
\| \| \| \| \| \| \|	Actually remove ARMInstructionSelector.h... Forgot to stage the removal in the previous commit. llvm-svn: 301633
*	[ARM] GlobalISel: Get rid of ARMInstructionSelector.h. NFC.	Diana Picus	2017-04-28	3	-3/+31
\| \| \| \| \| \| \| \| \|	Declare the ARMInstructionSelector in an anonymous namespace, to make it more in line with the other targets which were migrated to this in r299637 in order to avoid TableGen'erated headers being included in non-GlobalISel builds. llvm-svn: 301632
*	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and ↵	Craig Topper	2017-04-28	2	-25/+24
\| \| \| \| \| \| \| \| \| \| \| \|	simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620
*	[ARM] GlobalISel: Fix extended stack operands	Diana Picus	2017-04-27	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a crash when trying to extend a value passed as a sign- or zero-extended stack parameter. The cause of the crash was that we were setting the size of the loaded value to 32 bits, and then tyring to extend again to 32 bits. This patch addresses the issue by also introducing a G_TRUNC after the load. This will leave the unused bits to their original values set by the caller, while being consistent about the types. For values that are not extended, we just use a smaller load. llvm-svn: 301531
*	Move size and alignment information of regclass to TargetRegisterInfo	Krzysztof Parzyszek	2017-04-24	3	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
*	[ARM] GlobalISel: Legalize s8 and s16 G_(S\|U)DIV	Diana Picus	2017-04-24	2	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have to widen the operands to 32 bits and then we can either use hardware division if it is available or lower to a libcall otherwise. At the moment it is not enough to set the Legalizer action to WidenScalar, since for libcalls it won't know what to do (it won't be able to find what size to widen to, because it will find Libcall and not Legal for 32 bits). To hack around this limitation, we request Custom lowering, and as part of that we widen first and then we run another legalizeInstrStep on the widened DIV. llvm-svn: 301166
*	[ARM] GlobalISel: Support G_(S\|U)DIV for s32	Diana Picus	2017-04-24	3	-0/+19
\| \| \| \| \| \| \| \| \|	Add support for both targets with hardware division and without. For hardware division we have to add support throughout the pipeline (legalizer, reg bank select, instruction select). For targets without hardware division, we only need to mark it as a libcall. llvm-svn: 301164
*	[ARM] GlobalISel: Select G_CONSTANT with CImm operands	Diana Picus	2017-04-24	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When selecting a G_CONSTANT to a MOVi, we need the value to be an Imm operand. We used to just leave the G_CONSTANT operand unchanged, which works in some cases (such as the GEP offsets that we create when referring to stack slots). However, in many other places the G_CONSTANTs are created with CImm operands. This patch makes sure to handle those as well, and to error out gracefully if in the end we don't end up with an Imm operand. Thanks to Oliver Stannard for reporting this issue. llvm-svn: 301162
*	[globalisel][tablegen] Revise API for ComplexPattern operands to improve ↵	Daniel Sanders	2017-04-22	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flexibility. Summary: Some targets need to be able to do more complex rendering than just adding an operand or two to an instruction. For example, it may need to insert an instruction to extract a subreg first, or it may need to perform an operation on the operand. In SelectionDAG, targets would create SDNode's to achieve the desired effect during the complex pattern predicate. This worked because SelectionDAG had a form of garbage collection that would take care of SDNode's that were created but not used due to a later predicate rejecting a match. This doesn't translate well to GlobalISel and the churn was wasteful. The API changes in this patch enable GlobalISel to accomplish the same thing without the waste. The API is now: InstructionSelector::OptionalComplexRendererFn selectArithImmed(MachineOperand &Root) const; where Root is the root of the match. The return value can be omitted to indicate that the predicate failed to match, or a function with the signature ComplexRendererFn can be returned. For example: return OptionalComplexRendererFn( [=](MachineInstrBuilder &MIB) { MIB.addImm(Immed).addImm(ShVal); }); adds two immediate operands to the rendered instruction. Immed and ShVal are captured from the predicate function. As an added bonus, this also reduces the amount of information we need to provide to GIComplexOperandMatcher. Depends on D31418 Reviewers: aditya_nandakumar, t.p.northover, qcolombet, rovka, ab, javed.absar Reviewed By: ab Subscribers: dberris, kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D31761 llvm-svn: 301079
*	Re-commit r301040 "X86: Don't emit zero-byte functions on Windows"	Hans Wennborg	2017-04-21	8	-19/+13
\| \| \| \| \| \| \| \| \|	In addition to the original commit, tighten the condition for when to pad empty functions to COFF Windows. This avoids running into problems when targeting e.g. Win32 AMDGPU, which caused test failures when this was committed initially. llvm-svn: 301047
*	Revert r301040 "X86: Don't emit zero-byte functions on Windows"	Hans Wennborg	2017-04-21	8	-13/+19
\| \| \| \| \| \|	This broke almost all bots. Reverting while fixing. llvm-svn: 301041
*	X86: Don't emit zero-byte functions on Windows	Hans Wennborg	2017-04-21	8	-19/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Empty functions can lead to duplicate entries in the Guard CF Function Table of a binary due to multiple functions sharing the same RVA, causing the kernel to refuse to load that binary. We had a terrific bug due to this in Chromium. It turns out we were already doing this for Mach-O in certain situations. This patch expands the code for that in AsmPrinter::EmitFunctionBody() and renames TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it seems it was used for not just Mach-O anyway. Differential Revision: https://reviews.llvm.org/D32330 llvm-svn: 301040
*	ARM: make sure we use all entries in a vector before forming a vpaddl.	Tim Northover	2017-04-21	1	-5/+5
\| \| \| \| \| \| \| \| \|	Otherwise there's some mismatch, and we'll either form an illegal type or an illegal node. Thanks to Eli Friedman for pointing out the problem with my original solution. llvm-svn: 301036
*	ARM: don't try to create an i8 -> i32 vpaddl.	Tim Northover	2017-04-21	1	-2/+5
\| \| \| \| \| \| \| \|	DAG combine was mistakenly assuming that the step-up it was looking at was always a doubling, but it can sometimes be a larger extension in which case we'd crash. llvm-svn: 301002
*	[ARM] GlobalISel: Add support for G_TRUNC	Diana Picus	2017-04-21	2	-6/+26
\| \| \| \| \| \| \| \|	Select them as copies. We only select if both the source and the destination are on the same register bank, so this shouldn't cause any trouble. llvm-svn: 300971
*	[ARM] GlobalISel: Make struct arguments fail elegantly	Diana Picus	2017-04-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	The condition in isSupportedType didn't handle struct/array arguments properly. Fix the check and add a test to make sure we use the fallback path in this kind of situation. The test deals with some common cases where the call lowering should error out. There are still some issues here that need to be addressed (tail calls come to mind), but they can be addressed in other patches. llvm-svn: 300967
*	[Thumb1] The recently added tADCS and tSBCS pseudo-instructions were missing ↵	Artyom Skrobov	2017-04-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	`Uses = [CPSR]` Summary: Thanks to Oliver Stannard for helping catch this. Reviewers: olista01, efriedma Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31815 llvm-svn: 300951
*	ARM: lower "fence singlethread" to a pure compiler barrier.	Tim Northover	2017-04-20	2	-1/+12
\| \| \| \| \| \| \| \|	Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300904
*	ARM: handle post-indexed NEON ops where the offset isn't the access width.	Tim Northover	2017-04-20	2	-14/+24
\| \| \| \| \| \| \| \| \| \| \|	Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen. Should fix PR32658. llvm-svn: 300878
*	[Thumb-1] Fix corner cases for compressed jump tables	Weiming Zhao	2017-04-20	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When synthesized TBB/TBH is expanded, we need to avoid the case of: BaseReg is redefined after the load of branching target. E.g.: %R2 = tLEApcrelJT <jt#1> %R1 = tLDRr %R1, %R2 ==> %R2 = tLEApcrelJT <jt#1> %R2 = tLDRspi %SP, 12 %R2 = tLDRspi %SP, 12 tBR_JTr %R1 tTBB_JT %R2, %R1 ` Reviewers: jmolloy Reviewed By: jmolloy Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D32250 llvm-svn: 300870
*	Fix use-after-frees on memory allocated in a Recycler.	Benjamin Kramer	2017-04-20	2	-7/+6
\| \| \| \| \| \| \| \|	This will become asan errors once the patch lands that poisons the memory after free. The x86 change is a hack, but I don't see how to solve this properly at the moment. llvm-svn: 300867
*	[ARM] Fix handling of mapping symbols when changing sections	John Brawn	2017-04-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	ChangeSection incorrectly registers LastEMSInfo as belonging to the previous section, not the current section. This happens to work when changing sections using .section, as the previous section is set to the current section before the call to ChangeSection, but not when using .popsection. Differential Revision: https://reviews.llvm.org/D32225 llvm-svn: 300831
*	[ARM] Rename HW div feature to HW div Thumb. NFCI.	Diana Picus	2017-04-20	8	-36/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hardware div feature refers only to Thumb, but because of its name it is tempting to use it to check for hardware division in general, which may cause problems in ARM mode. See https://reviews.llvm.org/D32005. This patch adds "Thumb" to its name, to make its scope clear. One notable place where I haven't made the change is in the feature flag (used with -mattr), which is still hwdiv. Changing it would also require changes in a lot of tests, including clang tests, and it doesn't seem like it's worth the effort. Differential Revision: https://reviews.llvm.org/D32160 llvm-svn: 300827
*	ARMFrameLowering: Reserve emergency spill slot for large arguments	Matthias Braun	2017-04-19	1	-8/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-commit after revert in r300668. Changed getMaxFPOffset() to a more conservative heuristic instead of trying to be clever and missing for some exotic calling conventions. We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300761
*	[ARM] Remove redundant computeKnownBits helper.	Eli Friedman	2017-04-19	1	-29/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Move the BFI logic to computeKnownBitsForTargetNode, and delete the redundant CMOV logic. This is intended as a cleanup, but it's probably possible to construct a case where moving the BFI logic allows more combines. Differential Revision: https://reviews.llvm.org/D31795 llvm-svn: 300752
*	[ARM] Use TableGen patterns to select vtbl. NFC.	Eli Friedman	2017-04-19	3	-92/+59
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D32103 llvm-svn: 300749
*	ARM: TLS calling convention doesn't preserve r9 or r12 on Darwin.	Tim Northover	2017-04-19	1	-3/+3
\| \| \| \|	llvm-svn: 300726
*	Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments"	Renato Golin	2017-04-19	1	-41/+8
\| \| \| \| \| \|	This reverts commit r300639, as it broke self-hosting on ARM. PR32709. llvm-svn: 300668
*	[ARM] GlobalISel: Add support for G_MUL	Diana Picus	2017-04-19	3	-1/+12
\| \| \| \| \| \| \| \|	Support G_MUL, very similar to G_ADD and G_SUB. The only difference is in the instruction selector, where we have to select either MUL or MULv5 depending on the target. llvm-svn: 300665
*	ARM: Use methods to access data stored with frame instructions	Serge Pavlov	2017-04-19	3	-8/+27
\| \| \| \| \| \| \| \| \| \| \|	In r300196 several methods were added to TarfetInstrInfo to access data stored with call frame setup/destroy instructions. This change replaces calls to getOperand with calls to such special methods in ARM target. Differential Revision: https://reviews.llvm.org/D32127 llvm-svn: 300655
*	ARMFrameLowering: Reserve emergency spill slot for large arguments	Matthias Braun	2017-04-19	1	-8/+41
\| \| \| \| \| \| \| \| \| \| \| \|	We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300639
*	DAG: Make mayBeEmittedAsTailCall parameter const	Matt Arsenault	2017-04-18	2	-2/+2
\| \| \| \|	llvm-svn: 300603
*	[ARM] Add hardware build attributes in assembler	Oliver Stannard	2017-04-18	3	-164/+189
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the assembler, we should emit build attributes based on the target selected with command-line options. This matches the GNU assembler's behaviour. We only do this for build attributes which describe the hardware that is expected to be available, not the ones that describe ABI compatibility. This is done by moving some of the attribute emission code to ARMTargetStreamer, so that it can be shared between the assembly and code-generation code paths. Since the assembler only creates a MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to check raw features, and not use the convenience functions in ARMSubtarget. If different attributes are later specified using the .eabi_attribute directive, then they will take precedence, as happens when the same .eabi_attribute is specified twice. This must be enabled by an option, because we don't want to do this when parsing inline assembly. The attributes would match the ones emitted at the start of the file, so wouldn't actually change the emitted object file, but the extra directives would be added to every inline assembly block when emitting assembly, which we'd like to avoid. The majority of the changes in the build-attributes.ll test are just re-ordering the directives, because the hardware attributes are now emitted before the ABI ones. However, I did fix one bug which I spotted: Tag_CPU_arch_profile was not being emitted for v6M. Differential revision: https://reviews.llvm.org/D31812 llvm-svn: 300547
*	[ARM] GlobalISel: Add support for G_SUB	Diana Picus	2017-04-18	3	-2/+8
\| \| \| \| \| \| \|	Support G_SUB throughout the GlobalISel pipeline. It is exactly the same as G_ADD, nothing fancy. llvm-svn: 300546
*	[ARM] Check for correct HW div when lowering divmod	Diana Picus	2017-04-18	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For subtargets that use the custom lowering for divmod, e.g. gnueabi, we used to check if the subtarget has hardware divide and then lower to a div-mul-sub sequence if true, or to a libcall if false. However, judging by the usage of hasDivide vs hasDivideInARMMode, it seems that hasDivide only refers to Thumb. For instance, in the ARMTargetLowering constructor, the code that specifies whether to use libcalls for (S\|U)DIV looks like this: bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide() : Subtarget->hasDivideInARMMode(); In the case of divmod for arm-gnueabi, using only hasDivide() to determine what to do means that instead of lowering to __aeabi_idivmod to get the remainder, we lower to div-mul-sub and then further lower the div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but not in Thumb, we generate a libcall instead of using it (this is not an issue in practice since AFAICT none of the cores that we support have hardware divide in ARM but not Thumb). This patch fixes the code dealing with custom lowering to take into account the mode (Thumb or ARM) when deciding whether or not hardware division is available. Differential Revision: https://reviews.llvm.org/D32005 llvm-svn: 300536
*	[IR] Make paramHasAttr to use arg indices instead of attr indices	Reid Kleckner	2017-04-14	1	-9/+9
\| \| \| \| \| \| \| \| \|	This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern. Previously we were testing return value attributes with index 0, so I introduced hasReturnAttr() for that use case. llvm-svn: 300367
*	This patch closes PR#32216: Better testing of schedule model instruction ↵	Andrew V. Tischenko	2017-04-14	1	-2/+2
\| \| \| \| \| \| \| \|	latencies/throughputs. The details are here: https://reviews.llvm.org/D30941 llvm-svn: 300311
*	[SystemZ] TargetTransformInfo cost functions implemented.	Jonas Paulsson	2017-04-12	2	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052
*	[ARM] Refactor Thumb2 sat instructions	Sam Parker	2017-04-11	1	-48/+30
\| \| \| \| \| \| \| \| \|	Refactor the USAT, SSAT, USAT16 and SSAT16 instruction descriptions for Thumb2. Differential Revision: https://reviews.llvm.org/D31933 llvm-svn: 299945
*	GlobalISel: Allow legalizing G_FADD to a libcall	Diana Picus	2017-04-11	1	-0/+3
\| \| \| \| \| \| \| \| \|	Use the same handling in the generic legalizer code as for the other libcalls (G_FREM, G_FPOW). Enable it on ARM for float and double so we can test it. llvm-svn: 299931
*	[ARM/AArch64] Ensure valid vector element types for interleaved accesses	Matthew Simpson	2017-04-10	3	-29/+47
\| \| \| \| \| \| \| \| \| \| \|	This patch refactors and strengthens the type checks performed for interleaved accesses. The primary functional change is to ensure that the interleaved accesses have valid element types. The added test cases previously failed because the element type is f128. Differential Revision: https://reviews.llvm.org/D31817 llvm-svn: 299864
*	[ARM] GlobalISel: Support G_FPOW for float and double	Diana Picus	2017-04-10	1	-2/+3
\| \| \| \| \| \|	Legalize to a libcall. llvm-svn: 299841
*	[ARM] Prefer BIC over BFC in ARM mode.	Eli Friedman	2017-04-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	BIC is generally faster, and it can put the output in a different register from the input. We already do this in Thumb2 mode; not sure why the equivalent fix never got applied to ARM mode. Differential Revision: https://reviews.llvm.org/D31797 llvm-svn: 299803