bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Demote EmitTest to a helper function of EmitCmp. Route all callers ↵	Craig Topper	2018-12-13	2	-14/+9
\| \| \| \| \| \| \| \| \| \|	except EmitCmp through EmitCmp. This requires the two callers to manifest a 0 to make EmitCmp call EmitTest. I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely. llvm-svn: 349094
*	Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)"	Evgeniy Stepanov	2018-12-13	1	-3/+1
\| \| \| \| \| \| \| \|	Breaks sanitizer-android buildbot. This reverts commit af8443a984c3b491c9ca2996b8d126ea31e5ecbe. llvm-svn: 349092
*	[AArch64] Fix Exynos predicates (NFC)	Evandro Menezes	2018-12-13	1	-14/+23
\| \| \| \| \| \| \| \|	Fix the logic in the definition of the `ExynosShiftExPred` as a more specific version of `ExynosShiftPred`. But, since `ExynosShiftExPred` is not used yet, this change has NFC. llvm-svn: 349091
*	[SampleFDO] handle ProfileSampleAccurate when initializing function entry count	Wei Mi	2018-12-13	2	-22/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ProfileSampleAccurate is used to indicate the profile has exact match to the code to be optimized. Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle ProfileSampleAccurate in multiple places. Differential Revision: https://reviews.llvm.org/D55660 llvm-svn: 349088
*	Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute	Aakanksha Patil	2018-12-13	2	-65/+7
\| \| \| \| \| \|	This patch breaks RADV (and probably RadeonSI as well) llvm-svn: 349084
*	AMDGPU/GlobalISel: Legalize/regbankselect block_addr	Matt Arsenault	2018-12-13	2	-1/+6
\| \| \| \|	llvm-svn: 349081
*	Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail"	Nikita Popov	2018-12-13	1	-16/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. The previous version of this patch did not perform dependency checking properly: While the dependency is checked at the position of the memset, the used size must be that of the memcpy. Previously the size of the memset was used, which missed modification in the region MemSetSize..CopySize, resulting in miscompiles. The added tests cover variations of this issue. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 349078
*	[ThinLTO] Compute synthetic function entry count	Easwaran Raman	2018-12-13	11	-23/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch computes the synthetic function entry count on the whole program callgraph (based on module summary) and writes the entry counts to the summary. After function importing, this count gets attached to the IR as metadata. Since it adds a new field to the summary, this bumps up the version. Reviewers: tejohnson Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D43521 llvm-svn: 349076
*	[llvm] Address base discriminator overflow in X86DiscriminateMemOps	Mircea Trofin	2018-12-13	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Macros are expanded on a single line. In case of large expansions, with sufficiently many instructions with memory operands (and when -fdebug-info-for-profiling is requested), we may be unable to generate new base discriminator values - new values overflow (base discriminators may not be larger than 2^12). This CL warns instead of asserting in such a case. A subsequent CL will add APIs to check for overflow before creating new debug info. See https://bugs.llvm.org/show_bug.cgi?id=39890 Reviewers: davidxl, wmi, gbedwell Reviewed By: davidxl Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D55643 llvm-svn: 349075
*	[llvm-size][libobject] Add explicit "inTextSegment" methods similar to ↵	Jordan Rupprecht	2018-12-13	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"isText" section methods to calculate size correctly. Summary: llvm-size uses "isText()" etc. which seem to indicate whether the section contains code-like things, not whether or not it will actually go in the text segment when in a fully linked executable. The unit test added (elf-sizes.test) shows some types of sections that cause discrepencies versus the GNU size tool. llvm-size is not correctly reporting sizes of things mapping to text/data segments, at least for ELF files. This fixes pr38723. Reviewers: echristo, Bigcheese, MaskRay Reviewed By: MaskRay Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54369 llvm-svn: 349074
*	[LoopUtils] Use i32 instead of `void`.	Davide Italiano	2018-12-13	1	-1/+1
\| \| \| \| \| \| \| \|	The actual type of the first argument of the @dbg intrinsic doesn't really matter as we're setting it to `undef`, but the bitcode reader is picky about `void` types. llvm-svn: 349069
*	[MachO][TLOF] Add support for local symbols in the indirect symbol table	Francis Visoiu Mistrih	2018-12-13	1	-3/+22
\| \| \| \| \| \| \| \| \| \| \| \|	On 32-bit archs, before, we would assume that an indirect symbol will never have local linkage. This can lead to miscompiles where the symbol's value would be 0 and the linker would use that value, because the indirect symbol table would contain the value `INDIRECT_SYMBOL_LOCAL` for that specific symbol. Differential Revision: https://reviews.llvm.org/D55573 llvm-svn: 349060
*	[DAGCombiner] after simplifying demanded elements of vector operand of ↵	Sanjay Patel	2018-12-13	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \|	extract, revisit the extract; 2nd try This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from number of uses to an opcode test for DELETED_NODE based on existing similar code. Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349058
*	[X86][SSE] Add SSE vector imm/var shift support to ↵	Simon Pilgrim	2018-12-13	1	-0/+15
\| \| \| \| \| \|	SimplifyDemandedVectorEltsForTargetNode llvm-svn: 349057
*	revert rL349051: [DAGCombiner] after simplifying demanded elements of vector ↵	Sanjay Patel	2018-12-13	1	-6/+1
\| \| \| \| \| \| \| \| \|	operand of extract, revisit the extract This causes an address sanitizer bot failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio llvm-svn: 349056
*	[X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243)	Simon Pilgrim	2018-12-13	1	-9/+6
\| \| \| \| \| \|	There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches. llvm-svn: 349052
*	[DAGCombiner] after simplifying demanded elements of vector operand of ↵	Sanjay Patel	2018-12-13	1	-1/+6
\| \| \| \| \| \| \| \|	extract, revisit the extract Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349051
*	[Sparc] Add membar assembler tags	Daniel Cederman	2018-12-13	4	-1/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The Sparc V9 membar instruction can enforce different types of memory orderings depending on the value in its immediate field. In the architectural manual the type is selected by combining different assembler tags into a mask. This patch adds support for these tags. Reviewers: jyknight, venkatra, brad Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D53491 llvm-svn: 349048
*	[X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243)	Simon Pilgrim	2018-12-13	1	-2/+5
\| \| \| \|	llvm-svn: 349047
*	[Sparc] Use float register for integer constrained with "f" in inline asm	Daniel Cederman	2018-12-13	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Constraining an integer value to a floating point register using "f" causes an llvm_unreachable to trigger. This patch allows i32 integers to be placed in a single precision float register and i64 integers to be placed in a double precision float register. This matches the behavior of GCC. For other types the llvm_unreachable is removed to instead trigger an error message that points out the offending line. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D51614 llvm-svn: 349045
*	[PowerPC][NFC] Sorting out Pseudo related classes to avoid confusion	Jinsong Ji	2018-12-13	7	-351/+345
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several Pseudo in PowerPC backend. eg: * ISel Pseudo-instructions , which has let usesCustomInserter=1 in td ExpandISelPseudos -> EmitInstrWithCustomInserter will deal with them. * Post-RA pseudo instruction, which has let isPseudo = 1 in td, or Standard pseudo (SUBREG_TO_REG,COPY etc.) ExpandPostRAPseudos -> expandPostRAPseudo will expand them * Multi-instruction pseudo operations will expand them PPCAsmPrinter::EmitInstruction * Pseudo instruction in CodeEmitter, which has encoding of 0. Currently, in td files, especially PPCInstrVSX.td, we did not distinguish Post-RA pseudo instruction and Pseudo instruction in CodeEmitter very clearly. This patch is to * Rename Pseudo<> class to PPCEmitTimePseudo, which means encoding of 0 in CodeEmitter * Introduce new class PPCPostRAExpPseudo <> for previous PostRA Pseudo * Introduce new class PPCCustomInserterPseudo <> for previous Isel Pseudo Differential Revision: https://reviews.llvm.org/D55143 llvm-svn: 349044
*	[mir] Fix uninitialized variable in r349035 noticed by ↵	Daniel Sanders	2018-12-13	1	-1/+1
\| \| \| \| \| \|	clang-atom-d525-fedora-rel and 3 other bots llvm-svn: 349043
*	[X86][SSE] Merge the vXi16/vXi32 vector rotation expansion cases. NFCI.	Simon Pilgrim	2018-12-13	1	-13/+3
\| \| \| \| \| \|	Merged the repeated code into a single if(). llvm-svn: 349040
*	[SystemZ] Pass copy-hinted regs first from getRegAllocationHints().	Jonas Paulsson	2018-12-13	1	-3/+16
\| \| \| \| \| \| \| \| \|	When computing register allocation hints for a GRX32Bit register, make sure that any of the hinted registers that are also copy hints are returned first in the list. Review: Ulrich Weigand. llvm-svn: 349037
*	[mir] Serialize DILocation inline when not possible to use a metadata reference	Daniel Sanders	2018-12-13	4	-6/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Sometimes MIR-level passes create DILocations that were not present in the LLVM-IR. For example, it may merge two DILocations together to produce a DILocation that points to line 0. Previously, the address of these DILocations were printed which prevented the MIR from being read back into LLVM. With this patch, DILocations will use metadata references where possible and fall back on serializing them inline like so: MOV32mr %stack.0.x.addr, 1, _, 0, _, %0, debug-location !DILocation(line: 1, scope: !15) Reviewers: aprantl, vsk, arphaman Reviewed By: aprantl Subscribers: probinson, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D55243 llvm-svn: 349035
*	[X86][BWI] Don't custom lower vXi8 rotations.	Simon Pilgrim	2018-12-13	1	-18/+14
\| \| \| \| \| \|	We always expand to shifts anyhow - test changes are just different scheduling only. llvm-svn: 349034
*	[PowerPC] intrinsic llvm.eh.sjlj.setjmp should not have flag isBarrier.	Chen Zheng	2018-12-13	2	-2/+12
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D55499 llvm-svn: 349029
*	[DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner	Simon Pilgrim	2018-12-13	2	-8/+8
\| \| \| \| \| \|	Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028
*	[ARM GlobalISel] Support exts and truncs for Thumb2	Diana Picus	2018-12-13	2	-15/+18
\| \| \| \| \| \| \| \| \| \| \|	Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for them in the instruction selector. This uses handwritten code again because the patterns that are generated with TableGen are tuned for what the DAG combiner would produce and not for simple sext/zext nodes. Luckily, we only need to update the opcodes to use the Thumb2 variants, everything else can be reused from ARM. llvm-svn: 349026
*	[TargetLowering] Add ISD::ROTL/ROTR vector expansion	Simon Pilgrim	2018-12-13	4	-48/+64
\| \| \| \| \| \| \| \| \| \|	Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion. llvm-svn: 349025
*	[RISCV] Add support for the various RISC-V FMA instruction variants	Alex Bradbury	2018-12-13	3	-3/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds support for the various RISC-V FMA instructions (fmadd, fmsub, fnmsub, fnmadd). The criteria for choosing whether a fused add or subtract is used, as well as whether the product is negated or not, is whether some of the arguments to the llvm.fma.* intrinsic are negated or not. In the tests, extraneous fadd instructions were added to avoid the negation being performed using a xor trick, which prevented the proper FMA forms from being selected and thus tested. The FMA instruction patterns might seem incorrect (e.g., fnmadd: -rs1 * rs2 - rs3), but they should be correct. The misleading names were inherited from MIPS, where the negation happens after computing the sum. The llvm.fmuladd.* intrinsics still do not generate RISC-V FMA instructions, as that depends on TargetLowering::isFMAFasterthanFMulAndFAdd. Some comments in the test files about what type of instructions are there tested were updated, to better reflect the current content of those test files. Differential Revision: https://reviews.llvm.org/D54205 Patch by Luís Marques. llvm-svn: 349023
*	[AArch64] Catch some more CMN opportunities.	Arnaud A. de Grandmaison	2018-12-13	1	-0/+5
\| \| \| \| \| \|	Fixes https://bugs.llvm.org/show_bug.cgi?id=33486 llvm-svn: 349022
*	[CodeGen] Allow mempcy/memset to generate small overlapping stores.	Clement Courbet	2018-12-13	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D55365 llvm-svn: 349016
*	[asan] Don't check ODR violations for particular types of globals	Vitaly Buka	2018-12-13	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: private and internal: should not trigger ODR at all. unnamed_addr: current ODR checking approach fail and rereport false violation if a linker merges such globals linkonce_odr, weak_odr: could cause similar problems and they are already not instrumented for ELF. Reviewers: eugenis, kcc Subscribers: kubamracek, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55621 llvm-svn: 349015
*	AMDGPU/GlobalISel: Legalize f64 fadd/fmul	Matt Arsenault	2018-12-13	1	-3/+3
\| \| \| \|	llvm-svn: 349014
*	AMDGPU/GlobalISel: RegBankSelect some simple operations	Matt Arsenault	2018-12-13	2	-2/+29
\| \| \| \|	llvm-svn: 349012
*	[X86] Remove assert leftover from when i1 was a legal type. Add more ↵	Craig Topper	2018-12-13	1	-3/+1
\| \| \| \| \| \|	accurate assert. NFC llvm-svn: 349007
*	[AMDGPU] Fix build failure, second attempt	Stanislav Mekhanoshin	2018-12-13	1	-1/+1
\| \| \| \| \| \| \|	Some compilers complain that variable is captured and some complain when it is not. Switch to [&]. llvm-svn: 349006
*	[AMDGPU] Fix build failure	Stanislav Mekhanoshin	2018-12-13	1	-1/+1
\| \| \| \| \| \| \|	Fixed error 'lambda capture 'CondReg' is not required to be captured for this use'. llvm-svn: 349005
*	[AMDGPU] Simplify negated condition	Stanislav Mekhanoshin	2018-12-13	3	-0/+187
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize sequence: %sel = V_CNDMASK_B32_e64 0, 1, %cc %cmp = V_CMP_NE_U32 1, %1 $vcc = S_AND_B64 $exec, %cmp S_CBRANCH_VCC[N]Z => $vcc = S_ANDN2_B64 $exec, %cc S_CBRANCH_VCC[N]Z It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the rebuildSetCC(). Differential Revision: https://reviews.llvm.org/D55402 llvm-svn: 349003
*	Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail"	David L. Jones	2018-12-13	2	-36/+16
\| \| \| \| \| \| \|	This revision caused trucated memsets for structs with padding. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html llvm-svn: 349002
*	[LoopUtils] Prefer a set over a map. NFCI.	Davide Italiano	2018-12-13	1	-6/+4
\| \| \| \|	llvm-svn: 348999
*	[Support] Fix FileNameLength passed to SetFileInformationByHandle	Shoaib Meenai	2018-12-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The rename_internal function used for Windows has a minor bug where the filename length is passed as a character count instead of a byte count. Windows internally ignores this field, but other tools that hook NT api's may use the documented behavior: MSDN documentation specifying the size should be in bytes: https://docs.microsoft.com/en-us/windows/desktop/api/winbase/ns-winbase-_file_rename_info Patch by Ben Hillis. Differential Revision: https://reviews.llvm.org/D55624 llvm-svn: 348995
*	[globalisel] Add GISelChangeObserver::changingInstr()	Daniel Sanders	2018-12-12	3	-6/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In addition to knowing that an instruction is changed. It's also useful to know when it's about to change. For example, it might print the instruction so you can track the changes in a debug log, it might remove it from some queue while it's being worked on, or it might want to change several instructions as a single transaction and act on all the changes at once. Added changingInstr() to all existing uses of changedInstr() Reviewers: aditya_nandakumar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55623 llvm-svn: 348992
*	[WebAssembly] Update dylink section parsing	Sam Clegg	2018-12-12	2	-0/+7
\| \| \| \| \| \| \| \| \| \|	This updates the format of the dylink section in accordance with recent "spec" change: https://github.com/WebAssembly/tool-conventions/pull/77 Differential Revision: https://reviews.llvm.org/D55609 llvm-svn: 348989
*	[LoopDeletion] Update debug values after loop deletion.	Davide Italiano	2018-12-12	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When loops are deleted, we don't keep track of variables modified inside the loops, so the DI will contain the wrong value for these. e.g. int b() { int i; for (i = 0; i < 2; i++) ; patatino(); return a; -> 6 patatino(); 7 return a; 8 } 9 int main() { b(); } (lldb) frame var i (int) i = 0 We mark instead these values as unavailable inserting a @llvm.dbg.value(undef to make sure we don't end up printing an incorrect value in the debugger. We could consider doing something fancier, for, e.g. constants, in the future. PR39868. rdar://problem/46418795) Differential Revision: https://reviews.llvm.org/D55299 llvm-svn: 348988
*	[InstCombine] Fix negative GEP offset evaluation for 32-bit pointers	Nikita Popov	2018-12-12	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes https://bugs.llvm.org/show_bug.cgi?id=39908. The evaluateGEPOffsetExpression() function simplifies GEP offsets for use in comparisons against zero, basically by converting XScale+Offset==0 to X+Offset/Scale==0 if Scale divides Offset. However, before this is done, Offset is masked down to the pointer size. This results in incorrect results for negative Offsets, because we basically end up dividing the 32-bit offset zero* extended to 64-bit bits (rather than sign extended). Fix this by explicitly sign extending the truncated value. Differential Revision: https://reviews.llvm.org/D55449 llvm-svn: 348987
*	[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)	Ryan Prichard	2018-12-12	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The change is needed to support ELF TLS in Android. See D55581 for the same change in compiler-rt. Reviewers: srhines, eugenis Reviewed By: eugenis Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D55592 llvm-svn: 348983
*	[globalisel] Rename GISelChangeObserver's erasedInstr() to erasingInstr() ↵	Daniel Sanders	2018-12-12	3	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and related nits. NFC Summary: There's little of interest that can be done to an already-erased instruction. You can't inspect it, write it to a debug log, etc. It ought to be notification that we're about to erase it. Rename the function to clarify the timing of the event and reflect current usage. Also fixed one case where we were trying to print an erased instruction. Reviewers: aditya_nandakumar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55611 llvm-svn: 348976
*	[X86] Don't emit MULX by default with BMI2	Craig Topper	2018-12-12	1	-49/+17
\| \| \| \| \| \| \| \| \| \|	MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer. Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it. Differential Revision: https://reviews.llvm.org/D55565 llvm-svn: 348975