bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Add an Unoptimized Load Value Injection (LVI) Load Hardening Pass	Scott Constable	2020-06-24	3	-1/+82
\| \| \| \| \| \| \| \| \| \|	@nikic raised an issue on D75936 that the added complexity to the O0 pipeline was causing noticeable slowdowns for `-O0` builds. This patch addresses the issue by adding a pass with equal security properties, but without any optimizations (and more importantly, without the need for expensive analysis dependencies). Reviewers: nikic, craig.topper, mattdr Reviewed By: craig.topper, mattdr Differential Revision: https://reviews.llvm.org/D80964
*	[X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)	Scott Constable	2020-06-24	1	-3/+306
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs: http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf The algorithm implemented in this pass is as follows: 1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components: -SOURCE instructions (also includes function arguments) -SINK instructions -Basic block entry points -Basic block terminators -LFENCE instructions 2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6. 3. Use a heuristic or plugin to approximate minimal LFENCE insertion. 4. Insert one LFENCE along each CFG edge that was cut in step 3. 5. Go to step 2. 6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified. By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection. Differential Revision: https://reviews.llvm.org/D75937
*	[X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) ↵	Scott Constable	2020-06-24	7	-0/+984
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gadgets Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph. More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK). Also adds a new target feature to X86: +lvi-load-hardening The feature can be added via the clang CLI using -mlvi-hardening. Differential Revision: https://reviews.llvm.org/D75936
*	[X86] Fix to X86LoadValueInjectionRetHardeningPass for possible segfault	Scott Constable	2020-06-24	1	-0/+3
\| \| \| \| \| \|	`MBB.back()` could segfault if `MBB.empty()`. Fixed by checking for `MBB.empty()` in the loop. Differential Revision: https://reviews.llvm.org/D77584
*	Revert "[X86] Add a Pass that builds a Condensed CFG for Load Value ↵	Craig Topper	2020-06-24	7	-1035/+0
\| \| \| \| \| \| \| \| \|	Injection (LVI) Gadgets" This reverts commit c74dd640fd740c6928f66a39c7c15a014af3f66f. Reverting to address coding standard issues raised in post-commit review.
*	Revert "[X86] Add Support for Load Hardening to Mitigate Load Value ↵	Craig Topper	2020-06-24	1	-277/+5
\| \| \| \| \| \| \| \| \|	Injection (LVI)" This reverts commit 62c42e29ba43c9d79cd5bd2084b641fbff6a96d5 Reverting to address coding standard issues raised in post-commit review.
*	[X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)	Scott Constable	2020-06-24	1	-5/+277
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs: http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf The algorithm implemented in this pass is as follows: 1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components: -SOURCE instructions (also includes function arguments) -SINK instructions -Basic block entry points -Basic block terminators -LFENCE instructions 2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6. 3. Use a heuristic or plugin to approximate minimal LFENCE insertion. 4. Insert one LFENCE along each CFG edge that was cut in step 3. 5. Go to step 2. 6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified. By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection. Differential Revision: https://reviews.llvm.org/D75937
*	[X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) ↵	Scott Constable	2020-06-24	7	-0/+1035
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gadgets Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph. More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK). Also adds a new target feature to X86: +lvi-load-hardening The feature can be added via the clang CLI using -mlvi-hardening. Differential Revision: https://reviews.llvm.org/D75936
*	[X86] Add RET-hardening Support to mitigate Load Value Injection (LVI)	Scott Constable	2020-06-24	4	-0/+145
\| \| \| \| \| \| \| \| \| \| \| \| \|	Adding a pass that replaces every ret instruction with the sequence: pop <scratch-reg> lfence jmp *<scratch-reg> where <scratch-reg> is some available scratch register, according to the calling convention of the function being mitigated. Differential Revision: https://reviews.llvm.org/D75935
*	[X86] Add Indirect Thunk Support to X86 to mitigate Load Value Injection (LVI)	Scott Constable	2020-06-24	4	-8/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass replaces each indirect call/jump with a direct call to a thunk that looks like: lfence jmpq *%r11 This ensures that if the value in register %r11 was loaded from memory, then the value in %r11 is (architecturally) correct prior to the jump. Also adds a new target feature to X86: +lvi-cfi ("cfi" meaning control-flow integrity) The feature can be added via clang CLI using -mlvi-cfi. This is an alternate implementation to https://reviews.llvm.org/D75934 That merges the thunk insertion functionality with the existing X86 retpoline code. Differential Revision: https://reviews.llvm.org/D76812
*	[X86] Refactor X86IndirectThunks.cpp to Accommodate Mitigations other than ↵	Scott Constable	2020-06-24	1	-125/+157
\| \| \| \| \| \| \| \|	Retpoline Introduce a ThunkInserter CRTP base class from which new thunk types can inherit, e.g., thunks to mitigate https://software.intel.com/security-software-guidance/software-guidance/load-value-injection. Differential Revision: https://reviews.llvm.org/D76811
*	[X86][NFC] Generalize the naming of "Retpoline Thunks" and related code to ↵	Scott Constable	2020-06-24	14	-113/+135
\| \| \| \| \| \| \| \| \| \| \| \|	"Indirect Thunks" There are applications for indirect call/branch thunks other than retpoline for Spectre v2, e.g., https://software.intel.com/security-software-guidance/software-guidance/load-value-injection Therefore it makes sense to refactor X86RetpolineThunks as a more general capability. Differential Revision: https://reviews.llvm.org/D76810
*	[X86] make sure POP has implicit def/use of stack pointer when materializing ↵	Yuanfang Chen	2020-06-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	8-bit immediates for minsize Summary: Otherwise PostRA list scheduler may reorder instruction, such as schedule this ''' pushq $0x8 pop %rbx lea 0x2a0(%rsp),%r15 ''' to ''' pushq $0x8 lea 0x2a0(%rsp),%r15 pop %rbx ''' by mistake. The patch is to prevent this to happen by making sure POP has implicit use of SP. Reviewers: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77031 (cherry picked from commit ece79f47083babcabde3700c67b90ef19967a5b3)
*	[X86] Fold undef elts to 0 in getTargetVShiftByConstNode.	Craig Topper	2020-06-16	1	-3/+6
\| \| \| \| \| \| \| \|	Similar to D81212. Differential Revision: https://reviews.llvm.org/D81292 (cherry picked from commit 3408dcbdf054ac3cc32a97a6a82a3cf5844be609)
*	[X86] Teach combineVectorShiftImm to constant fold undef elements to 0 not ↵	Craig Topper	2020-06-16	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	undef. Shifts are supposed to always shift in zeros or sign bits regardless of their inputs. It's possible the input value may have been replaced with undef by SimplifyDemandedBits, but the shift in zeros are still demanded. This issue was reported to me by ispc from 10.0. Unfortunately their failing test does not fail on trunk. Seems to be because the shl is optimized out earlier now and doesn't become VSHLI. ispc bug https://github.com/ispc/ispc/issues/1771 Differential Revision: https://reviews.llvm.org/D81212 (cherry picked from commit 7c9a89fed8f5d53d61fe3a61a2581a7c28b1b6d2)
*	[X86] Add x, t and g modifiers for inline asm	Craig Topper	2020-06-11	1	-1/+45
\| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the x, t and g modifiers for inline asm from GCC. These will print a vector register as xmm, ymm or zmm* respectively. I also fixed register names with modifiers with inteldialect so they are no longer printed with a leading %. Patch by Amanieu d'Antras Differential Revision: https://reviews.llvm.org/D78977 (cherry picked from commit c5f7c039efe7ff09a44cfd252f6cb001ceed6269)
*	Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow ↵	Xiang1 Zhang	2020-05-18	3	-8/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enforcement Technology) Do not commit the llvm/test/ExecutionEngine/MCJIT/cet-code-model-lager.ll because it will cause build bot fail(not suitable for window 32 target). Summary: This patch comes from H.J.'s https://github.com/hjl-tools/llvm-project/commit/2bd54ce7fa9e94fcd1118b948e14d1b6fc54dfd2 This patch fix the failed llvm unit tests which running on CET machine. (e.g. ExecutionEngine/MCJIT/MCJITTests) The reason we enable IBT at "JIT compiled with CET" is mainly that: the JIT don't know the its caller program is CET enable or not. If JIT's caller program is non-CET, it is no problem JIT generate CET code or not. But if JIT's caller program is CET enabled, JIT must generate CET code or it will cause Control protection exceptions. I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed. and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too. (if not apply this patch, VNCserver will crash at CET machine.) Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei Reviewed By: LuoYuanke Subscribers: tstellar, efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76900 (cherry picked from commit 01a32f2bd3fad4331cebe9f4faa270d7c082d281)
*	CET for Exception Handle	Pengfei Wang	2020-05-18	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Bug fix for https://bugs.llvm.org/show_bug.cgi?id=45182 Exception handle may indirectly jump to catch pad, So we should add ENDBR instruction before catch pad instructions. Reviewers: craig.topper, hjl.tools, LuoYuanke, annita.zhang, pengfei Reviewed By: LuoYuanke Subscribers: hiraditya, llvm-commits Patch By: Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D76190 (cherry picked from commit 974d649f8eaf3026ccb9d1b77bdec55da25366e5)
*	[X86][SSE] combineX86ShufflesConstants - early out for zeroable vectors ↵	Simon Pilgrim	2020-04-16	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	(PR45443) Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets). If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail. Fixes PR45443 (cherry picked from commit e3b60597769f79a8abc19fb8ef1f321d9adc1358)
*	[X86CmovConversion] Make heuristic for optimized cmov depth more ↵	Nikita Popov	2020-02-19	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	conservative (PR44539) Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights. This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice). Differential Revision: https://reviews.llvm.org/D74155 (cherry picked from commit 5eb19bf4a2b0c29a8d4d48dfb0276f096eff9bec)
*	[X86] Use MVT::i8 instead of MVT::i64 for shift amount in BuildSDIVPow2	Craig Topper	2020-02-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	X86 uses i8 for shift amounts. This code can fail on a 32-bit target if it runs after type legalization. This code was copied from AArch64 and modified for X86, but the shift amount wasn't changed to the correct type for X86. Fixes PR44812 (cherry picked from commit ec9a94af4d5fb3270f2451fcbec5a3a99f4ac03a)
*	[X86] -fpatchable-function-entry=N,0: place patch label after ENDBR{32,64}	Fangrui Song	2020-02-05	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to D73680 (AArch64 BTI). A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR. Also, add 32-bit tests and test that .cfi_startproc is at the function entry. The line information has a general implementation and is tested by AArch64/patchable-function-entry-empty.mir Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73760 (cherry picked from commit 8ff86fcf4c038c7156ed4f01e7ed35cae49489e2)
*	[X86] Make `llc --help` output readable again	Roman Lebedev	2020-01-27	1	-7/+7
\| \| \| \| \| \| \| \| \|	Long `cl::value_desc()` is added right after the flag name, before `cl::desc()` column. And thus the `cl::desc()` column, for all flags, is padded to the right, which makes the output unreadable. (cherry picked from commit 70cbf8c71c510077baadcad305fea6f62e830b06)
*	[X86] Don't call LowerUINT_TO_FP_i32 for i32->f80 on 32-bit targets with sse2.	Craig Topper	2020-01-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	We were performing an emulated i32->f64 in the SSE registers, then storing that value to memory and doing a extload into the X87 domain. After this patch we'll now just store the i32 to memory along with an i32 0. Then do a 64-bit FILD to f80 completely in the X87 unit. This matches what we do without SSE.
*	CMake: Make most target symbols hidden by default	Tom Stellard	2020-01-14	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439
*	[BranchAlign] Add master --x86-branches-within-32B-boundaries flag	Philip Reames	2020-01-14	1	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \|	This flag was originally part of D70157, but was removed as we carved away pieces of the review. Since we have the nop support checked in, and it appears mature(), I think it's time to add the master flag. For now, it will default to nop padding, but once the prefix padding support lands, we'll update the defaults. () I can now confirm that downstream testing of the changes which have landed to date - nop padding and compiler support for suppressions - is passing all of the functional testing we've thrown at it. There might still be something lurking, but we've gotten enough coverage to be confident of the basic approach. Note that the new flag can be used either when assembling an .s file, or when using the integrated assembler directly from the compiler. The later will use all of the suppression mechanism and should always generate correct code. We don't yet have assembly syntax for the suppressions, so passing this directly to the assembler w/a raw .s file may result in broken code. Use at your own risk. Also note that this isn't the wiring for the clang option. I think the most recent review for that is D72227, but I've lost track, so that might be off. Differential Revision: https://reviews.llvm.org/D72738
*	[Win64] Handle FP arguments more gracefully under -mno-sse	Reid Kleckner	2020-01-14	2	-22/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pass small FP values in GPRs or stack memory according the the normal convention. This is what gcc -mno-sse does on Win64. I adjusted the conditions under which we emit an error to check if the argument or return value would be passed in an XMM register when SSE is disabled. This has a side effect of no longer emitting an error for FP arguments marked 'inreg' when targetting x86 with SSE disabled. Our calling convention logic was already assigning it to FP0/FP1, and then we emitted this error. That seems unnecessary, we can ignore 'inreg' and compile it without SSE. Reviewers: jyknight, aemerson Differential Revision: https://reviews.llvm.org/D70465
*	[X86] Drop an unneeded FIXME. NFC	Craig Topper	2020-01-14	1	-1/+0
\| \| \| \|	The extload on X87 is free.
*	[X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit ↵	Craig Topper	2020-01-14	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	mode i64->f32/f64/f80 uint_to_fp algorithm. This allows us to generate better code for selecting the fixup to load. Previously when the sign was set we had to load offset 0. And when it was clear we had to load offset 4. This required a testl, setns, zero extend, and finally a mul by 4. By switching the offsets we can just shift the sign bit into the lsb and multiply it by 4.
*	[X86] Directly emit a BROADCAST_LOAD from constant pool in ↵	Craig Topper	2020-01-14	1	-2/+12
\| \| \| \| \| \| \| \|	lowerUINT_TO_FP_vXi32 to avoid double loads seen in D71971 By directly emitting the constants as a constant pool load we seem to avoid the build_vector/extract_subvector combines that resulted in the duplicate loads we had before. Differential Revision: https://reviews.llvm.org/D72307
*	[X86] Copy the nofpexcept flag when folding a load into an instruction using ↵	Craig Topper	2020-01-13	1	-0/+4
\| \| \| \|	the load folding tables./
*	[X86][Disassembler] Fix a bug when disassembling an empty string	Fangrui Song	2020-01-13	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	readPrefixes() assumes insn->bytes is non-empty. The code path is not exercised in llvm-mc because llvm-mc does not feed empty input to MCDisassembler::getInstruction(). This bug is uncovered by a5994c789a2982a770254ae1607b5b4cb641f73c. An empty string did not crash before because the deleted regionReader() allowed UINT64_C(-1) as insn->readerCursor. Bytes.size() <= Address -> R->Base 0 <= UINT64_C(-1) - UINT32_C(-1)
*	[X86] Fix MSVC "truncation from 'int' to 'bool'" warning. NFCI.	Simon Pilgrim	2020-01-13	1	-2/+2
\|
*	[X86] Use SDNPOptInGlue instead of SDNPInGlue on a couple SDNodes.	Craig Topper	2020-01-12	1	-2/+2
\| \| \| \| \|	At least one of these is used without a Glue. This doesn't seem to change the X86GenDAGISel.inc output so maybe it doesn't matter?
*	[X86][AVX] Use lowerShuffleAsLanePermuteAndSHUFP to lower binary v4f64 shuffles.	Simon Pilgrim	2020-01-12	1	-0/+12
\| \| \| \| \| \|	Only perform this if we are shuffling lower and upper lane elements across the lanes (otherwise splitting to lower xmm shuffles would be better). This is a regression if we shuffle build_vectors due to getVectorShuffle canonicalizing 'blend of splat' build vectors, for now I've set this not to shuffle build_vector nodes at all to avoid this.
*	[X86][AVX] lowerShuffleAsLanePermuteAndSHUFP - only set the demanded ↵	Simon Pilgrim	2020-01-12	1	-2/+1
\| \| \| \| \| \|	elements of the lane mask. Fixes an cyclic dependency issue with an upcoming patch where getVectorShuffle canonicalizes masks with splat build vector sources.
*	[X86][Disassembler] Merge X86DisassemblerDecoder.cpp into ↵	Fangrui Song	2020-01-12	4	-1868/+1569
\| \| \| \|	X86Disassembler.cpp and refactor
*	[X86][Disassembler] Simplify	Fangrui Song	2020-01-12	3	-45/+7
\|
*	[X86] Don't call LowerSETCC from LowerSELECT for ↵	Craig Topper	2020-01-11	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \|	STRICT_FSETCC/STRICT_FSETCCS nodes. This causes the STRICT_FSETCC/STRICT_FSETCCS nodes to lowered early while lowering SELECT, but the output chain doesn't get connected. Then we visit the node again when it is its turn because we haven't replaced the use of the chain result. In the case of the fp128 libcall lowering, after D72341 this will cause the libcall to be emitted twice.
*	[TargetLowering][X86] Connect the chain from STRICT_FSETCC in ↵	Craig Topper	2020-01-11	1	-2/+4
\| \| \| \|	TargetLowering::expandFP_TO_UINT and X86TargetLowering::FP_TO_INTHelper.
*	[X86][Disassembler] Optimize argument passing and immediate reading	Fangrui Song	2020-01-11	3	-74/+41
\|
*	[Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction()	Fangrui Song	2020-01-11	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	The argument is llvm::null() everywhere except llvm::errs() in llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds. If we ever have the needs to add verbose log to disassemblers, we can record log with a member function, instead of passing it around as an argument.
*	[X86][Disassembler] Replace custom logger with LLVM_DEBUG	Fangrui Song	2020-01-11	3	-56/+14
\| \| \| \| \| \| \|	llvm-objdump -d on clang is decreased from 7.8s to 7.4s. The improvement is likely due to the elimination of logger setup and dbgprintf(), which has a large overhead.
*	[X86][Disassembler] Simplify and optimize reader functions	Fangrui Song	2020-01-11	3	-180/+101
\| \| \| \|	llvm-objdump -d on clang is decreased from 8.2s to 7.8s.
*	[X86] Turn FP_ROUND/STRICT_FP_ROUND into X86ISD::VFPROUND/STRICT_VFPROUND ↵	Craig Topper	2020-01-11	3	-67/+4
\| \| \| \|	during PreprocessISelDAG to remove some duplicate isel patterns.
*	[X86] Adjust nop emission by compiler to consider target decode limitations	Philip Reames	2020-01-11	1	-0/+17
\| \| \| \| \| \|	The primary motivation of this change is to bring the code more closely in sync behavior wise with the assembler's version of nop emission. I'd like to eventually factor them into one, but that's hard to do when one has features the other doesn't. The longest encodeable nop on x86 is 15 bytes, but many processors - for instance all intel chips - can't decode the 15 byte form efficiently. On those processors, it's better to use either a 10 byte or 11 byte sequence depending.
*	[X86AsmBackend] Move static function before sole use [NFC]	Philip Reames	2020-01-11	1	-34/+34
\|
*	[X86AsmBackend] Be consistent about placing definitions out of line [NFC]	Philip Reames	2020-01-11	1	-49/+57
\|
*	[X86] Fix outdated comment	Simon Pilgrim	2020-01-11	1	-2/+1
\| \| \| \|	The generic saturated math opcodes are no longer widened inside X86TargetLowering
*	[X86][AVX] Add lowerShuffleAsLanePermuteAndSHUFP lowering	Simon Pilgrim	2020-01-11	1	-0/+40
\| \| \| \| \| \| \| \|	Add initial support for lowering v4f64 shuffles to SHUFPD(VPERM2F128(V1, V2), VPERM2F128(V1, V2)), eventually this could be used for v8f32 (and maybe v8f64/v16f32) but I'm being conservative for the initial implementation as only v4f64 can always succeed. This currently is only called from lowerShuffleAsLanePermuteAndShuffle so only gets used for unary shuffles, and we limit this to cases where we use upper elements as otherwise concating 2 xmm shuffles is probably the better case. Helps with poor shuffles mentioned in D66004.